|DB Meeting:||Friday September 16, 2:00pm, DC1331|
|DB Seminar:||Monday September 19, 11:00am, DC1304|
|Speaker:||Kevin Chang, University of Illinois, Urbana-Champaign|
|Title:||Building a MetaQuerier and Beyond: A Trilogy of Search, Integration, and Mining for Web Information Access|
|DB Meeting:||Friday September 23, 2:00pm, DC1331|
|Topic:||View Matching for Outer-Joins, OQL, and Conjunctive XQuery|
This talk will be an informal attempt to draw connections between three
1. "View Matching for Outer-Join Views", Larson & Zhou, VLDB 2005
2. "Deciding Containment for Queries with Complex Objects", Levy & Suciu, PODS 1997
3. "Containment of Nested XML Queries", Dong, Halevy & Tatarinov, VLDB 2004
The first paper takes previous work by Cesar Galindo-Legaria on normal forms (hence equivalence) for select-project-outer-join expressions and uses it as the basis for a view matching algorithm. The next two papers consider the containment/equivalence problems for conjunctive OQL and XQuery, respectively. I obviously won't be able to describe all three papers in detail, so I'll focus primarily on the Larson paper, and then address the other two papers at a fairly high level, pointing out the similarities and differences in the problems being considered.
|DB Seminar:||Tuesday October 4, 11:00am, MC5136 (Please note the unusual day and place)|
|Speaker:||Kaladhar Voruganti, IBM Almaden Research Center|
|Title:||SMaestro: Second Generation Storage Infrastructure Management|
|Master's Thesis Presentation:||Friday October 7, 10:00am, DC1304|
|Title:||Communication Cost Modeling for Federated Database Systems|
|DB Meeting:||Friday October 7, 2:00pm, DC1331|
|Title:||From the Conjunctive Queries to XPATH, between Theory and Practice, some realisations and perspectives|
I worked in the past on adaptive algorithms for conjunctive queries (aka
Google-like queries), and adapted some of those algorithms in a search engine on
file systems. In the last years I tried to generalize the theoretical analysis
to some queries on XML documents, and in the last monthes I have been
implementing some of my algorithms and testing them on data provided by Google.
At this point I would like to get some feedback from the database group before going forward in any direction (theoretical analysis vs practical, exact vs approximated queries, conjonctive vs structured queries), and eventually discuss potential collaborations.
|DB Meeting:||Friday October 14, 2:00pm, DC1331|
|Title:||How does records management fit into things?|
|Abstract:||"Records management is the application of systematic and scientific controls to recorded information required in the operation of an organization's business." I will start with an overview of various components of records management, concentrating on its application to electronic records. Thereafter I'll look briefly at some implications of an information retention and disposition program on historical databases, backup and recovery, and access control.|
|DB Seminar:||Monday October 17, 11:00am, DC1304|
|Speaker:||Volker Markl, IBM Almaden Research Center|
|Title:||Learning in Query Optimization|
|DB Meeting:||Friday October 21, 2:00pm, DC1331|
|Title:||Capturing more meaning for the semantic web|
|Abstract:||I'll review a sequence of pregressively richer ontology languages that have been proposed for the semantic web, beginning with RDF and ending with (full) OWL. It is hoped that this will have a side effect of creating a lively discussion on the idea of a semantic web.|
|Seminar:||Monday October 24, 11:00am, DC1304|
|Speaker:||Divesh Srivastava, AT&T Labs Research|
|Title:||Approximate Joins: Concepts and Techniques|
The quality of the data residing in information repositories
and databases gets degraded due to a multitude of reasons.
In the presence of data quality errors, a central problem is
to identify all pairs of entities (tuples) in two sets of
entities that are approximately the same. This operation has
been studied through the years and it is known under various
names, including record linkage, entity identification,
entity reconciliation and approximate join, to name a few.
The objective of this talk is to provide an overview of key
research results and techniques used for approximate joins.
This is joint work with Nick Koudas.
|DB Meeting:||Friday October 28, 2:00pm, DC1331|
|Speaker:||Dan Farrar, Sybase iAnywhere|
|Title:||Optimizing the Other Half of Database Applications|
In practice, the factor most limiting the performance and scalability of
many database applications is not the DBMS itself, but is the expertise of
the application developers. Application architecture and interfacing
issues can impose significant penalties on system performance. However,
for non-expert designers and programmers, identifying these issues can be
difficult. Making this determination easier is an important part of
reducing the total cost of database ownership.
In this talk, I will discuss common kinds of problems that can hurt database application performance. I will talk about the type of support that a DBMS must provide to help application developers identify and resolve these problems, and how this need is addressed by the major vendors. I will also suggest areas in which this functionality can be leveraged to automatically improve database performance.
|DB Meeting:||Friday November 4, 2:00pm, MC 5158 (Not our regular meeting room)|
|Title:||XSEED: Accurate and Fast Cardinality Estimation for XPath Queries|
In this talk, I am going to present XSEED, a synopsis of path queries
for cardinality estimation. The synopsis is constructed by starting from
a very small kernel, then it is incrementally updated. With such an
incremental construction, a synopsis structure can be dynamically
configured to accommodate different memory and construction time
budgets. Cardinality estimation based on XSEED can be performed very
efficiently and accurately. Extensive experiments on both synthetic and
real data sets are conducted, and our results show that even with less
memory, the accuracy of XSEED could achieve an order of magnitude better
than that of other synopsis structures. The estimation time is under 2%
of the actual querying time for a wide range of queries in all test cases.
This is a join work with M. Tamer Ozsu, Ashraf Aboulnaga, and Ihab F. Ilyas.
|DB Seminar:||Thursday November 10, 11:00am, DC 1304 (Please note the unusual day)|
|Speaker:||Sihem Amer-Yahia, AT&T Labs-Research|
|Title:||The Role of Document Structure in Querying, Scoring and Evaluating XML Full-Text Search|
|DB Meeting:||Friday November 11, 2:00pm, DC1331|
|Title:||TREC 2005 Spam Track Overview|
|Abstract:||TREC's Spam Track introduces a standard testing framework that presents a chronological sequence of email messages, one at a time, to a spam fiter for classification. The fiter yields a binary judgement (spam or ham [i.e. non-spam]) which is compared to a human-adjudicated gold standard. The filter also yields a spamminess score, intended to reflect the likelihood that the classified message is spam, which is the subject of post-hoc ROC (Receiver Operating Characteristic) analysis. The gold standard for each message is communicated to the filter immediately following classification. Eight test corpora -- email messages plus gold standard judgements -- were used to evaluate 53 subject filters. Five of the corpora (the public corpora) were distributed to participants, who ran their filters on the corpora using a track-supplied toolkit implementing the framework. Three of the corpora (the private corpora) were not distributed to participants; rather, participants submitted filter implementations that were run, using the toolkit, on the private data. Twelve groups participated in the track, submitting 44 filters for evaluation. The other nine subject filters were variants of popular open-source implementations adapted for use in the toolkit in consultation with their authors.|
|DB Seminar:||Monday November 14, 11:00am, DC1304|
|Speaker:||Ling Liu, Georgia Institute of Technology|
|Title:||MobiEyes: Distributed Processing of Moving Queries over Moving Objects|
|DB Meeting:||Friday December 2, 2:00pm, DC1331|
|Speaker:||Anil Goel, Sybase iAnywhere|
|Topic:||Supporting Multiple View Maintenance Policies in DBMS|
In this talk I will present the issues related to multiple maintenance
policies for materialized views in a database management system and how
the optimizer decisions on when to use materialized views to answer a
query. I will review first the pioneer ideas presented in the paper  on
the subject, and then I will discuss
how the work on which paper  is based relates to this issue. A quick
review of how commercial database management systems implement multiple
maintenance policies will follow.
 Latha S. Colby, Akira Kawaguchi, Daniel F. Lieuwen, Inderpal Singh
Mumick, Kenneth A. Ross:
"Supporting Multiple View Maintenance Policies", SIGMOD 1997
|DB Seminar:||Monday December 5, 11:00am, DC1304|
|Speaker:||Mary Fernandez, AT&T Labs - Research|
|Title:||Implementing XQuery 1.0: The Story of Galax|