Database Research Group Events

Fall 2002

DB meeting: Friday, September 6th, 2:00 pm, DC3303 (the DB Lab)
Speaker: Kick-off meeting - no speaker today.
Topic: Help welcome new faculty and grads.  Coffee and snacks will be provided.

DB meeting: Friday, September 13th, 2:00 pm, DC1331
Speaker: Feng Zhao
Topic: Exploring altruistic links In a Web Crawler
Abstract: This thesis examines issues in the design of an effective Web crawler, which can retrieve high quality pages early and quickly.

A crawler is a program that retrieves Web pages, commonly for use by a search engine, to maintain hypertext structures, or to summarize resources.  It traverses the Web, navigating through hyperlinks, retrieving the content of associated pages. Crawlers are widely used today. The design of a good crawler presents many challenges.

This thesis first discusses the architecture and design of a Web crawler. We then discuss a definition for high quality Web pages. We also introduce the concept of altruistic links, which we define as links pointing from commercial websites to non-commercial websites and vice versa. We suspect that altruistic links are possible indication of better page quality of the web pages referenced, which may better answer questions from the users of Web search engines. Finally, we try to estimate and compare the quality of Web pages referenced by altruistic links and by other kinds of links.

We implemented a Web crawler and built a set of pilot experiments for finding and crawling altruistic links; we also compared the page quality of the Web pages for different kinds of links.

Note: this is an MMath thesis presentation

DB meeting: Friday, September 20th, 2:00 pm, DC1331
Speaker: David Toman
Topic: TBA

DB Seminar: Monday, September 23rd, 11:00 AM, DC1304
Speaker: Klaus R. Dittrich, University of Zurich
Title: SINGAPORE: Towards flexible querying of heterogeneous data sources

SCS Research Group Presentation: Friday, September 27th, 3:30 pm, DC1302
Speaker: DB and Programming Languages Groups

DB Seminar: Friday, October 4th, 2:00 PM, DC1304
Speaker: Yelena Yesha, University Maryland at Baltimore County
Title: Profile Driven Data Management for Pervasive Environments

DB meeting: Friday, October 11th, 2:00 pm, DC1331
Speaker: David DeHaan
Topic: I will give a survey of some algorithms for estimating the number of distinct values in a dataset using limited working memory.   Solving this problem using random sampling has been well studied in the statistics literature, and I will not discuss that work other than to present a lower bound on the hardness (Chariker et al, PODS 2000). Instead, I will focus on some recent (and not-so-recent) one-pass algorithms.

DB meeting: Friday, October 18th, 2:00 pm, DC1331
Speaker: Gísli Hjaltason
Topic: The Index Fabric: Properties and Challenges
Abstract: This talk describes the Index Fabric, a novel indexing structure that is suitable for large collections of long keys, such as might occur when indexing XML documents. Data represented in the Index Fabric is encoded using semantic tags embedded in the indexed keys, so multiple search paths can be present in the same index. Furthermore, relationships can be represented in the index by building composite keys, thus allowing search over a relationship without requiring join operations. Two challenges, that are faced in reaching a practical implementation of the Index Fabric, are briefly described, namely that of obtaining a small index structure and that of concurrency control. A solution to the former problem is described.

DB Seminar: Monday, October 21st, 11:00 AM, DC1304
Speaker: Jayavel Shanmugasundaram, Cornell University
Title: Bridging Relational Technology and XML

DB meeting: Friday, October 25th, 2:00 pm, DC1331
Speaker: Benjamin Bin Yao
Topic: XBench - A Family of Benchmarks for XML DBMSs
Abstract: XML is beginning to be extensively used in various application domains, and as a result, large amounts of XML documents are being generated. Researchers in both industry and academia have proposed a number of approaches to efficiently store, manipulate, and retrieve XML documents. The individual performance characteristics of these approaches as well as the relative performance of various systems is an ongoing concern.

The range of XML application and the XML data that they manage are quite varied and no one database schema and workload can properly capture this variety. We propose a family of XML benchmarks, collectively call XBench, to measure and evaluate the performance of different approaches to deal with the management of XML documents. The family is defined according to a classification of applications, and each class has its own database and workload.

We will discuss the general requirements for an XML DBMS benchmark, followed by a detailed explanation of the XBench,including: the methodology of database generation, the workload, and the setup of test environment. A brief discussion of other existing XML benchmarks and comparison among them will be given as well.

DB meeting: Friday, November 1st, 2:00 pm, DC1331
Speaker: Heng Yu
Topic: I will present a paper Principles and Realization Strategies of Multilevel Transaction Management by Gerhard Weikum.  It is in ACM Transactions on Database Systems, Vol. 16 No. 1, 1991.

Traditionally, the research on transaction control mainly focus on high abstract object or tuple level
operations. However, the implementation of such operations spans several underlying levels, probably end up at the bottom with the page level operations.  For performance reason, low-level operations that contribute to high-level ones usually interleave.  Concurrency control take care of all layers. Therefore multilevel transaction management becomes significant. The paper establishes a tree-like framework of multilevel transactions. Based on it, the multilevel serializability is studied, and corresponding concurrency control and recovery techniques are developed and evaluated.

DB Seminar: Monday, November 4th, 11:00 AM, DC1304
Speaker: Guozhu Dong, Wright State University
Title: Mining Knowledge about Changes, Differences, and Trends

DB meeting: Friday, November 8th, 2:00 pm, DC1331
Speaker: Ning Zhang
Topic: Optimizing Correlated Path Queries in XML Languages
Abstract: Path expressions are ubiquitous in XML processing languages such as XPath, XQuery, XSLT, and XPointer. Expressions in these languages typically include multiple path expressions, some of them are correlated. Existing approaches evaluate these path expressions one-at-a-time and miss the optimization opportunities that may be gained by exploiting the correlations among them. In this paper, we identify the types of correlations between multiple path expressions in a query, and propose optimizations that are based on pattern graph matching. These optimization techniques rewrite pattern graphs in a logical level and produce a set of equivalent pattern graphs from which a physical optimizer can choose given an appropriate cost function.

DB meeting: Friday, November 15th, 2:00 pm, DC1331
Speaker: Vahid Karimi
Topic: WebDAV and DeltaV for Collaborative Authoring and Versioning
Abstract: Distributed Authoring and Versioning (WebDAV) and DeltaV (DAV Versioning) are defined by the Internet Engineering Task Force (IETF) in RFC 2518 and RFC 3253 respectively. WebDAV and DeltaV both extend HTTP to accomplish the goal of collaborative authoring and versioning on the Web, using XML extensively to accomplish this goal. Therefore there is a close working relationship between the World Wide Web Consortium (W3C) and IETF. 

In this survey, I review the extensions that WebDAV and DeltaV introduce to HTTP. I conclude this survey with the lessons that I think can be learned from this work. 

(This is an MMath essay presentation.)

DB meeting: Friday, November 22nd, 2:00 pm, DC1331
Speaker: Ian Davis
Topic: Integrating XML and SQL
Abstract: There is growing interest in integrating SQL and XML. Data stored in relational tables needs to be translated by SQL engines into XML encoded structures, and XML encoded structures need to be mapped into relational tables.

In this talk I will be discussing my participation in the H2.3 working group, which is writing the SQL XML-Related Specification (SQL/XML).  This section of the SQL standard defines how arbitrary SQL data may be translated without loss of information content into XML. H2.3 reports to SC21 of the ISO, which is responsible for progressing the international SQL standard.

I will also be presenting software I have developed that employs an extended XPath language to map XML data into the desired SQL views.  This software is to be presented at the XML 2002 conference to be held this December in Baltimore.


DB meeting: Friday, November 29th, 2:00 pm, DC1331
Speaker: Gord Cormack
Topic: Question Answering at TREC 2002
Abstract: For the fourth year, the Text Retrieval Conference (TREC) has has a Question Answering task, in which systems find answers to short factual questions within an unstructured corpus of text.  The results are presented in November.  I have just returned from TREC 2002 and will describe the task, the results, and some approaches to the problem.

DB Seminar: Monday, December 2nd, 11:00 AM, DC1304
Speaker: Michael Kifer, SUNY at Stony Brook
Title: FLORA-2: Programming with Logic and Objects

DB meeting: Friday, December 6th, 2:00 pm, DC1331
Speaker: Meng He
Topic: Indexing Compressed Texts (MMath thesis presentation)
Abstract: As a result of the rapid growth of the volume of electronic data, text compression and indexing techniques are receiving more and more attention. These two issuses are usually treated as independent problems, but approaches of combining them have recently attracted the attention of researchers.

In this thesis, we review and test some of the more effective and some of the more theoretically interesting techniques. Various compression and indexing techniques are presented, and we also present two compressed text indices. Based on these techniques, we implement an compressed full-text index, so that compressed texts can be indexed to support fast queries without decompressing the whole texts. The experiments show that our index is compact and supports fast search.

DB meeting: Friday, December 13th, 2:00 pm, DC1331
Speaker: Ani Nica
Topic: Evaluation of SQL subqueries in relational database systems
Abstract: In this talk I will present  some popular techniques  for elimination of SQL subqueries in relational database systems. The SQL subqueries can be  converted into equivalent expressions using special relational operators. The presentation will  also cover further optimizations for transforming these special relational operators into general join operators in a cost-based optimizer.

This page is maintained by  Ken Salem.