[Please remove <h1>]
Fall 2002
DB
meeting: |
Friday, September 6th, 2:00 pm, DC3303
(the DB Lab) |
Speaker: |
Kick-off meeting - no speaker today. |
Topic: |
Help welcome new faculty and grads. Coffee and snacks will be
provided. |
DB
meeting: |
Friday, September 13th, 2:00 pm, DC1331 |
Speaker: |
Feng Zhao |
Topic: |
Exploring altruistic links In a Web Crawler |
Abstract: |
This thesis examines issues in the design of an effective Web
crawler, which can retrieve high quality pages early and quickly.
A crawler is a program that retrieves Web pages, commonly for use by
a search engine, to maintain hypertext structures, or to summarize resources.
It traverses the Web, navigating through hyperlinks, retrieving the content
of associated pages. Crawlers are widely used today. The design of a good
crawler presents many challenges.
This thesis first discusses the architecture and design of a Web crawler.
We then discuss a definition for high quality Web pages. We also introduce
the concept of altruistic links, which we define as links pointing from
commercial websites to non-commercial websites and vice versa. We suspect
that altruistic links are possible indication of better page quality of
the web pages referenced, which may better answer questions from the users
of Web search engines. Finally, we try to estimate and compare the quality
of Web pages referenced by altruistic links and by other kinds of links.
We implemented a Web crawler and built a set of pilot experiments for
finding and crawling altruistic links; we also compared the page quality
of the Web pages for different kinds of links. |
Note: |
this is an MMath thesis presentation |
DB
meeting: |
Friday, September 20th, 2:00 pm, DC1331 |
Speaker: |
David Toman |
Topic: |
TBA |
SCS Research Group Presentation: |
Friday, September 27th, 3:30 pm, DC1302 |
Speaker: |
DB and Programming Languages Groups |
DB
meeting: |
Friday, October 11th, 2:00 pm, DC1331 |
Speaker: |
David DeHaan |
Topic: |
I will give a survey of some algorithms for estimating the number of
distinct values in a dataset using limited working memory.
Solving this problem using random sampling has been well studied in the
statistics literature, and I will not discuss that work other than to present
a lower bound on the hardness (Chariker et al, PODS 2000). Instead, I will
focus on some recent (and not-so-recent) one-pass algorithms. |
DB
meeting: |
Friday, October 18th, 2:00 pm, DC1331 |
Speaker: |
Gísli Hjaltason |
Topic: |
The Index Fabric: Properties and Challenges |
Abstract: |
This talk describes the Index Fabric, a novel indexing structure
that is suitable for large collections of long keys, such as might occur
when indexing XML documents. Data represented in the Index Fabric is encoded
using semantic tags embedded in the indexed keys, so multiple search paths
can be present in the same index. Furthermore, relationships can be represented
in the index by building composite keys, thus allowing search over a relationship
without requiring join operations. Two challenges, that are faced in reaching
a practical implementation of the Index Fabric, are briefly described,
namely that of obtaining a small index structure and that of concurrency
control. A solution to the former problem is described. |
DB
meeting: |
Friday, October 25th, 2:00 pm, DC1331 |
Speaker: |
Benjamin Bin Yao |
Topic: |
XBench - A Family of Benchmarks for XML DBMSs |
Abstract: |
XML is beginning to be extensively used in various application
domains, and as a result, large amounts of XML documents are being generated.
Researchers in both industry and academia have proposed a number of approaches
to efficiently store, manipulate, and retrieve XML documents. The individual
performance characteristics of these approaches as well as the relative
performance of various systems is an ongoing concern.
The range of XML application and the XML data that they manage are quite
varied and no one database schema and workload can properly capture this
variety. We propose a family of XML benchmarks, collectively call XBench,
to measure and evaluate the performance of different approaches to deal
with the management of XML documents. The family is defined according to
a classification of applications, and each class has its own database and
workload.
We will discuss the general requirements for an XML DBMS benchmark,
followed by a detailed explanation of the XBench,including: the methodology
of database generation, the workload, and the setup of test environment.
A brief discussion of other existing XML benchmarks and comparison among
them will be given as well. |
DB
meeting: |
Friday, November 1st, 2:00 pm, DC1331 |
Speaker: |
Heng Yu |
Topic: |
I will present a paper Principles and Realization Strategies of
Multilevel Transaction Management by Gerhard Weikum. It is in
ACM Transactions on Database Systems, Vol. 16 No. 1, 1991.
Traditionally, the research on transaction control mainly focus on high
abstract object or tuple level
operations. However, the implementation of such operations spans several
underlying levels, probably end up at the bottom with the page level operations.
For performance reason, low-level operations that contribute to high-level
ones usually interleave. Concurrency control take care of all layers.
Therefore multilevel transaction management becomes significant. The paper
establishes a tree-like framework of multilevel transactions. Based on
it, the multilevel serializability is studied, and corresponding concurrency
control and recovery techniques are developed and evaluated. |
DB
meeting: |
Friday, November 8th, 2:00 pm, DC1331 |
Speaker: |
Ning Zhang |
Topic: |
Optimizing Correlated Path Queries in XML Languages |
Abstract: |
Path expressions are ubiquitous in XML processing languages such as
XPath, XQuery, XSLT, and XPointer. Expressions in these languages typically
include multiple path expressions, some of them are correlated. Existing
approaches evaluate these path expressions one-at-a-time and miss the optimization
opportunities that may be gained by exploiting the correlations among them.
In this paper, we identify the types of correlations between multiple path
expressions in a query, and propose optimizations that are based on pattern
graph matching. These optimization techniques rewrite pattern graphs in
a logical level and produce a set of equivalent pattern graphs from which
a physical optimizer can choose given an appropriate cost function. |
DB
meeting: |
Friday, November 15th, 2:00 pm, DC1331 |
Speaker: |
Vahid Karimi |
Topic: |
WebDAV and DeltaV for Collaborative Authoring and Versioning |
Abstract: |
Distributed Authoring and Versioning (WebDAV) and DeltaV (DAV Versioning)
are defined by the Internet Engineering Task Force (IETF) in RFC 2518 and
RFC 3253 respectively. WebDAV and DeltaV both extend HTTP to accomplish
the goal of collaborative authoring and versioning on the Web, using XML
extensively to accomplish this goal. Therefore there is a close working
relationship between the World Wide Web Consortium (W3C) and IETF.
In this survey, I review the extensions that WebDAV and DeltaV introduce
to HTTP. I conclude this survey with the lessons that I think can be learned
from this work.
(This is an MMath essay presentation.) |
DB
meeting: |
Friday, November 22nd, 2:00 pm, DC1331 |
Speaker: |
Ian Davis |
Topic: |
Integrating XML and SQL |
Abstract: |
There is growing interest in integrating SQL and XML. Data stored
in relational tables needs to be translated by SQL engines into XML encoded
structures, and XML encoded structures need to be mapped into relational
tables.
In this talk I will be discussing my participation in the H2.3 working
group, which is writing the SQL XML-Related Specification (SQL/XML).
This section of the SQL standard defines how arbitrary SQL data may be
translated without loss of information content into XML. H2.3 reports to
SC21 of the ISO, which is responsible for progressing the international
SQL standard.
I will also be presenting software I have developed that employs an
extended XPath language to map XML data into the desired SQL views.
This software is to be presented at the XML 2002 conference to be held
this December in Baltimore.
|
DB
meeting: |
Friday, November 29th, 2:00 pm, DC1331 |
Speaker: |
Gord Cormack |
Topic: |
Question Answering at TREC 2002 |
Abstract: |
For the fourth year, the Text Retrieval Conference (TREC) has
has a Question Answering task, in which systems find answers to short factual
questions within an unstructured corpus of text. The results are
presented in November. I have just returned from TREC 2002 and will
describe the task, the results, and some approaches to the problem. |
DB
meeting: |
Friday, December 6th, 2:00 pm, DC1331 |
Speaker: |
Meng He |
Topic: |
Indexing Compressed Texts (MMath thesis presentation) |
Abstract: |
As a result of the rapid growth of the volume of electronic
data, text compression and indexing techniques are receiving more and more
attention. These two issuses are usually treated as independent problems,
but approaches of combining them have recently attracted the attention
of researchers.
In this thesis, we review and test some of the more effective and some
of the more theoretically interesting techniques. Various compression and
indexing techniques are presented, and we also present two compressed text
indices. Based on these techniques, we implement an compressed full-text
index, so that compressed texts can be indexed to support fast queries
without decompressing the whole texts. The experiments show that our index
is compact and supports fast search. |
DB
meeting: |
Friday, December 13th, 2:00 pm, DC1331 |
Speaker: |
Ani Nica |
Topic: |
Evaluation of SQL subqueries in relational database systems |
Abstract: |
In this talk I will present some popular techniques
for elimination of SQL subqueries in relational database systems. The SQL
subqueries can be converted into equivalent expressions using special
relational operators. The presentation will also cover further optimizations
for transforming these special relational operators into general join operators
in a cost-based optimizer. |
This page is maintained by Ken
Salem.