[Please remove <h1>]
Fall 2008
Note: Events of interest to the
Database Research Group are posted to the uw.cs.database
newsgroup and are mailed to the
db-group@lists.uwaterloo.ca
mailing list. There are actually three mailing lists aggregated into the
db-group list: db-faculty
(for DB group faculty), db-grads (for DB group graduate students),
and db-friends (for DB group alumni, visitors, and friends). If
you wish to subscribe to one of these three lists (or to unsubscribe), please
visit
https://lists.uwaterloo.ca/mailman/listinfo/<listname>, where
<listname> is the list you wish to subscribe to.
- DB group meetings
- The DB group meets most Friday afternoons at 2pm, usually in DC1331.
See the list of current events for
times and locations of upcoming meetings. Each meeting lasts
for an hour and features an informal presentation by one of the
members of the group. Everyone is welcome to attend. These talks are
intended to raise questions and to stimulate discussion rather than
being polished presentations of research results. Speakers are determined
using a rotating speaker list, which can be found on the
DB group meeting page
- DB seminar series
- The DB seminar series features visiting speakers. These seminars are
more-or-less monthly, and are usually scheduled on Monday
mornings at 11am. See the list of current
events for times and locations of upcoming seminars. The
full schedule can be found on the DB seminar series page.
Recent and Upcoming Events
DB Meeting:
|
Friday September 12, 2:00pm, DC 1331
|
Speaker:
|
Anil Goel
|
Title:
|
Hello Flash, so long HD?
|
Abstract:
|
I will review the argument in favour of making widespread use of SSD devices
for mainstream data storage and discuss recent work supporting the argument
as well as pinpointing additional work needed in DBMS software
|
References:
|
1. Latency Lags Bandwidth, Patterson, CACM, Oct 2004
(http://portal.acm.org/citation.cfm?id=1022596)
2. The five-minute rule twenty years later, and how flash memory changes the rules, Graefe, DaMoN 2007
(http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=549)
3. A case for flash memory SSD in enterprise database applications, Lee, et al, SIGMOD 2008
(http://vldb.skku.ac.kr/vldb/sub_page/pub/SIGMOD2008-SSD-paper.pdf)
4. Flash storage today, Leventhal, ACM Queue, July/Aug 2008
(http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=547)
5. Design of flash based DBMS: An in-page logging approach, Lee, et al, SIGMOD 2007
(http://vldb.skku.ac.kr/vldb/sub_page/pub/SIGMOD2007-IPL-paper.pdf)
6. Flash disk opportunity for server-applications, Gray, et al
(http://research.microsoft.com/~gray/papers/FlashDiskPublic.doc)
7. The five-minute rule ten years later, and other computer storage rules of thumb, Gray, et al, SIGMOD 1997
(http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=P116)
8. Enterprise SSDs, Moshayedi, et al, ACM Queue, Jul/Aug 2008
(http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=548)
9. Intel's X25-M SSD, The Tech Report, Sep 2008
(http://techreport.com/articles.x/15433)
|
DB Meeting:
|
Friday September 26, 2:00pm, DC 1331
|
Speaker:
|
Oguzhan Ozmen
|
Title:
|
Database-Aware Storage Layout Design
|
Abstract:
|
The process of database design includes a number of steps
performed by an expert, a database administrator. Following logical and
physical database
design, the very last step is to lay out database objects (i.e., tables,
indexes, materialized views) on the underlying storage system. Storage
layout design consists of assigning each object to a subset of the
available storage devices. Indeed, the choice of how database objects
are assigned to storage devices can significantly impact the I/O
performance of a DBMS.
Layout decisions require information about the I/O behavior of the DBMS;
otherwise, they rely only on heuristics or some generic guidelines. A
storage layout design that does not
consider the I/O behavior and requirements of the DBMS may, in the end,
result in suboptimal DBMS performance. Currently, the I/O behavior of a
storage client is measured
through collecting (observing) its I/O trace. In this talk, I will
briefly present our alternative method to produce I/O traces for DBMS
other than measuring it. Then, I will discuss how we make use of this
information to enable an informed, database-aware layout design.
|
DB Meeting:
|
Friday October 3, 2:00pm, DC 1331
|
Speaker:
|
Ken Salem
|
Title:
|
Amazon's Dynamo
|
Abstract:
|
I plan to
discuss Dynamo,
a scalable data storage system developed by Amazon.com to support its
web services.
|
Seminar:
|
Tuesday October 7, 1:00pm, DC 1302
|
Speaker:
|
Irfan Ahmad, VMware
|
Title:
|
Topics in Virtualization Research
|
Abstract:
|
The initial impact of virtual machines (VMs) on commodity hardware was
in the software development community where virtual machines eased the
pain of development and test cycles. Large corporations were not far
behind as adopters of VM technology to more fully utilize their
hardware resources.
As these trends takes hold, several interesting problems emerge worthy
of research. I'll provide a survey of these topics ranging from cloud
and datacenter filesystems, distributed io resource management,
security mobile device virtualization and profile movement back and
forth between the cloud and devices, resource management/load
balancing/live migration in very large clusters up to 100,000 nodes,
support for large database systems, paravirtualization and technical
and economic models for cloud computing.
I'll leave plenty of time at the end for questions and a lively
discussion. All attendees will receive a copy of VMware Workstation
for the PC or VMware Fusion for the Mac. Insightful questions/comments
will receive additional shwag. Oh and refreshments too.
|
Bio:
|
Irfan is a member of the Resource Management team at VMware. Most
recently he has been researching novel techniques for proportional
sharing of distributed IO resources as well as techniques for
transparent and efficient virtual machine memory page
classification. In the past, Irfan has worked on disk IO workload
characterization and performance enhancements for running database
workloads in virtual machines.
|
DB Meeting:
|
Friday October 24, 2:00pm, DC 1331
|
Speaker:
|
George Beskales
|
Title:
|
Survey of Record Deduplication Techniques
|
Abstract:
|
Real-world databases experience various data quality problems including
duplicate records, violated integrity constraints, and missing values.
Data cleaning is the process of detecting and correcting such anomalies
to improve data quality. I will focus on one goal of the data cleaning
process, which is to eliminate duplicate records. I plan to review
state-of-the-art duplicate elimination techniques. Also, I will
highlight characteristics of the deduplication problem and present a
taxonomy of clustering algorithms that are used in records deduplication.
|
Seminar:
|
Friday November 7, 2:00pm, DC 1331
|
Speaker:
|
Panos Ipeirotis, New York University
|
Title:
|
Structuring and querying online opinions using econometrics
|
Abstract:
|
Today, users post online reviews expressing their opinions for
movies, restaurants, and many other products. They also evaluate
merchants and react to news about political campaigns.
Structuring and ranking these opinions in terms of importance and
polarity is a difficult research problem. How can we infer the
importance and polarity of the posted content? How can we
structure and quantify the effect of the online opinions? Many
existing approaches rely on human annotators to evaluate the
polarity and strength of the opinions, a laborious and error-
prone task. We take a different approach by considering the
economic context in which an opinion is evaluated. We rely on the
fact that the text in on-line systems influence the behavior of
the readers and this effect can be observed using some easy-to-measure
economic variables, such as revenues or product prices.
Then, by reversing the logic, we infer the semantic orientation
and the strength of an opinion by tracing the changes in the
associated economic variable. In effect, we combine econometrics
with text mining algorithms to identify the "economic value of
text" and assign a "dollar value" to each opinion, quantifying
sentiment effectively and without the need for manual effort.
We present applications on reputation systems, online product
reviews, travel search, and the effect of online media on elections.
|
DB Meeting:
|
Friday November 14, 2:00pm, DC 1331
|
Speaker:
|
Frank Tompa
|
Title:
|
Three text-indexing papers from CIKM 2008
|
Abstract:
|
I will present three papers from the recent CIKM conference,
each of which deals with a different type of text index
and a different aspect of index performance:
- Derrick Coetzee (MSR) presented "TinyLex: Static N-Gram Index Pruning with Perfect Recall,"
in which he showed how to build effective indexes with lexicons of smaller size by using variable-length n-grams.
- Ercegovac, Josifovski, Li, Mediano, and Shekita (IBM & Yahoo)
presented "Supporting Sub-Document Updates and Queries in an Inverted Index,"
in which they showed how to accommodate updates to sections of documents
without re-indexing complete documents and
yet without causing excessive overhead when querying.
- Barsky, Stege, Thomo, and Upton (Victoria) presented "A New Method for Indexing Genomes Using On-Disk Suffix Trees,"
in which they showed how to build very large suffix trees faster than by using previous methods.
|
DB Meeting:
|
Friday November 21, 2:00pm, DC 1331
|
Speaker:
|
Dan Farrar
|
Title:
|
Making databases functional
|
Abstract:
|
The effectiveness of a particular database language and programming
interface for a given application is closely related to the semantics and
native design patterns of the application language. Creating robust and
high-performance database applications requires that the tools and
languages used provide a good fit, both to each other and to the
application domain. In this talk, I will discuss a taxonomy of database
interfaces and show how these map to a taxonomy of application programming
languages, and the downsides of mixing these mappings (one example is the
object-relational impedance mismatch). I will discuss traditional
imperative and object-oriented languages, and then I will then look in more
detail at functional languages and their associated database interfaces,
with a focus on LINQ and Haskell.
References:
Integrating Programming Languages & Databases: What's the Problem?, Cook
and Ibrahim, (http://www.cs.utexas.edu/users/wcook/Drafts/2005/PLDBProblem.pdf)
Monad Comprehensions: A Versatile Representation for Queries, Grust,
(http://www.inf.uni-konstanz.de/dbis/publications/download/monad-comprehensions.pdf)
|
DB Meeting:
|
Friday November 28, 2:00pm, DC 1331
|
Speaker:
|
David Toman
|
Title:
|
Another look at query reformulation under complex constraints and mappings
|
Abstract:
|
Rewriting queries formulated over a high-level conceptual
schemata to query plans---essentially queries over physical artifacts
comprising the physical design that supports the high-level conceptual
design---is one of the essential tasks performed by query optimizers
(and/or by various mapping/integration/exchange/etc. layers). In this
talk we look at the common underpinnings of this problem and on an
alternative approach to this problem based on the work on definability
in first-order logic by Beth and Craig.
|
This page is maintained by
Ashraf Aboulnaga.