|DB meeting:||Friday, September 17, 2:00pm, DC1331|
|DB Seminar:||Monday, September 20, 11:00 am, DC1304|
|Speaker:||C. Mohan, IBM Almaden Research Lab|
DBCache: A Project on Database Caching Support for Web Applications
|DB meeting:||Friday, September 24, 2:00pm, DC1331|
|Title:||The DLF Dialects of Description Logic|
|Abstract:||Description logics (DLs) are an important family of model logics that have been used to formally capture object-relational schema and UML class diagrams. They also underlie recent efforts to define the Ontology Web Language (OWL), a W3C standard currently under development for capturing structured knowledge of web services. In this talk, I'll give an overview of DLs and their relevance to the database community, with a particular focus on dialects that incorporate features or attributes. I'll conclude with an additional overview of recent work on incorporating functional dependencies in these dialects.|
|DB meeting:||Friday, October 1, 2:00pm, DC1331|
|Title:||Views-on-Views in SQL Server|
Indexed (a.k.a. materialized) views are used to speed up query processing of data-intensive queries, and most commercial relational database systems now provide some level of support for them. Like indices over base relations, materialized views are used transparently by the query optimizer to speed up queries written over the vocabulary of base relations or logical views. To minimize the online overhead of maintaining a view extent, materialized view definitions in SQL databases are commonly restricted to a class of simple aggregation expressions over joined base relations (called SPJG, or Select- Project-Join-Groupby) because efficient incremental maintenance strategies are known to exist for this class. Systems that support more complex view definitions typically limit them to offline maintenance.
In this talk I will describe work done this summer with Paul Larson to extend SQL Server to support "views-on-views", which are SPJG indexed views that reference other indexed views as if they were base relations. This class of views is interesting because it maintains properties of incremental maintenance, yet it increases the expressive power of the view language to encompass several types of common non-SPJG queries. After exploring some of the advantages of supporting views-on-views, I will describe at a high level the modifications needed to the SQL Server query optimizer to utilize views-on- views for query processing. Finally, I will briefly explore how allowing indexed views to contain Outer Join would extend the power of the view language even further.
|DB Seminar:||Monday, October 4, 4:30pm, DC1350|
|Speaker:||Jim Gray, Microsoft Bay Area Research Center|
On-Line Science: The World-Wide Telescope as a Prototype for the New Computational Science
|DB meeting:||Friday, October 15, 2:00pm, DC1331|
|Title:||An Overview of DL Dialects DLFAD and DLFDreg|
Description Logics (DL) play an important role in many applications, say software engineering, medical informatics, digital libraries, web-based information systems, databases, etc. In databases, take query optimization for example, query rewrites, duplication elimination and many other operations can be abstracted as DL logical implication problems. In this talk, I will give an overview of DL dialects DLFAD and DLFDreg. These dialects are proposed by David Toman and Grant Weddell in their papers: "Attribute Inversion in Description Logics with Path Functional Dependencies" and "On Reasoning about Structural Equality in XML: A Description Logic Approach".
The main contributions of DLFAD are: (1) it provides both attribute inversion and path functional dependency in the dialect; (2) it proves that in general case the implication problem of DLFAD is undecidable by reducing an unrestricted tiling problem to a DLFAD implication problem; (3) it proves that the implication problem is decidable under coherent condition by reducing it to the satisfiability problem of the Ackerman formulae.
The main property of DLFDreg is that it uses regular expressions to express functional dependencies by possibly infinite sets of feature paths. This can be used to reason about structural equality in XML. The paper also gives the proof that the implication problem of DLFDreg is decidable by reducing the problem to the satisfiability problem of Datalogns with negation.
Also I will discuss the relations between DLFAD and DLFDreg as well as some possible extension work based on them.
|DB Seminar:||Monday, October 18, 11:00 am, DC1304|
|Speaker:||Gustavo Alonso, ETH Zürich|
Database replication for commodity database services
|Seminar:||Tuesday, October 19, 11:00am, DC1304|
|Speaker:||Brian Cooper, Georgia Institute of Technology|
|Title:||Using information retrieval techniques to route queries in an InfoBeacons network|
|Abstract:||We present the InfoBeacons system, in which a peer-to-peer network of beacons cooperates to route queries to the best information sources. The routing in our system uses techniques adapted from information retrieval. We examine routing at two levels. First, each beacon is assigned several sources and routes queries to those sources. Many sources are unwilling to provide more cooperation than simple searching, and we must adapt traditional information retrieval techniques to choose the best sources despite this lack of cooperation. Second, beacons route queries to other beacons using techniques similar to those for routing queries to sources. We examine alternative architectures for routing queries between beacons. Results of experiments using a beacon network to search 1,000 information sources demonstrates how our techniques can be used to efficiently route queries; for example, our techniques require contacting up to 70 percent fewer sources than random walk techniques.|
|DB meeting:||Friday, October 22, 2:00pm, DC1331|
|Title:||The Physical Data Placement Problem in Shared-Disk Relational Database Systems|
Magnetic disks are still bottlenecks of today's database systems. To improve disk I/O efficiency and hence system performance, two physical data placement schemes, data declustering and data clustering, are used by database system administrators. Data objects, such as tables, indexes and materialized views, can be striped across multiple disks in order to maximize intra-object I/O parallelism (declustering), or can be isolated into separate disks or disk sets in order to maximize inter-object I/O parallelism (clustering).
In this presentation, we will introduce our study on the performance impacts of these different data placement schemes. In our study, we examined two representative data layouts, partitioning and full-striping. We studied some important aspects of disk I/O, such as disk seeks and disk read-ahead, and how they affect I/O efficiency under the two data layouts. We also studied the effects of a few environmental factors, such as the number of concurrent queries and the degree of I/O parallelism, and their effects on the relative performance of different data layouts. Our experiments show that physical data placement has a significant impact on disk I/O efficiency and hence system performance. The relative performance of different data layouts highly depends on workload and other environment factors.
|DB meeting:||Friday, October 29, 2:00pm, DC1331|
I plan to discuss view-based database access controls, with
relational and XML examples. I'll be drawing material from
a couple of papers from this year's SIGMOD conference:
|DB meeting:||Friday, November 5, 2:00pm, DC1331|
|Title:||Randomized Databases and Query Sets|
As the complexity of database systems, and in particular of query languages, continues to grow, it is becoming increasingly labour-intensive to develop tests that adequately cover the functionality of these systems. Such testing is vital for commercial and open-source products to ensure correctness and stability. An effective (and cost-effective) solution is to stochastically generate database instances and query workloads, and test their operation. These techniques are also useful for evaluating the performance of systems and algorithms, especially in query processing.
I will give an overview of existing methods for generating random databases and queries, focusing on the generation of SQL statements. I will discuss our experiences at iAnywhere with two types of randomized database and query generation, and compare papers published by Oracle and Microsoft describing their experiences with similar methods.
|DB meeting:||Friday, November 12, 2:00pm, DC1331|
I will present a proposal for storing and querying archived data, as presented
primarily in the following paper:
|DB meeting:||Friday, November 19, 2:00pm, DC1331|
|Title:||BlossomTree: Evaluating XPaths in FLWOR Expressions|
Efficient evaluation of path expressions has been studied extensively. However, evaluating more complex FLWOR expressions that contain multiple path expressions has not been well studied. In this talk, I will present a pattern matching approach, called BlossomTree, to evaluating a FLWOR expression that contains correlated path expressions. BlossomTree is a formalism to capture the semantics of the path expressions and their correlations.
We propose a general algebraic framework (abstract data types and logical operators) to evaluate BlossomTree pattern matching that facilitates efficient evaluation and experimentation. We design efficient data structures and algorithms to implement the abstract data types and logical operators. Our experimental studies demonstrate that the BlossomTree approach can generate highly efficient query plans in different environments.
|DB Seminar:||Monday, November 22, 11:00 am, DC1304|
|Speaker:||Moshe Vardi, Rice University|
A Call to Regularity
|Seminar:||Tuesday, November 23, 10:30am, DC1304|
|Speaker:||Peter Patel-Schneider, Bell Labs Research|
|Title:||What is OWL (and why should I give a hoot)?|
OWL is the new ontology language produced by the W3C Web Ontology Working Group. As such, it is poised to be a major formalism for the design and dissemination of ontology information, particularly in the Semantic Web, a part of the World-Wide Web. OWL has influences from several communities, including the RDF community, the Description Logic community, and the frame community. These influences have resulted in a wide variety of requirements on OWL, several of which appear to be conflicting. OWL contains innovative solutions to several of these apparent conflicts but other conflicts have meant that it has not been possible to satisfy all the desired requirements for OWL.
In this talk I will describe the design and development of OWL concentrating on what makes OWL important, the relationship of OWL to other efforts, the innovative solutions that were required in its design, and the impact of the conflicting requirements on OWL.
|DB meeting:||Friday, November 26, 2:00pm, DC1331|
|Title:||SPAM Filter Evaluation at TREC|
The purpose of TREC is to provide standard procedures and archival data for the evaluation of the effectiveness of tools for various information retrieval tasks. For TREC 2005 one such task will be spam filtering. A spam filter is an on-line binary classifier that identifies each incoming email message as spam or not.
Measuring the effectiveness of spam filters presents several challenges. The standard of accuracy required for acceptable performance is quite high. The trade-off between false positives (rejected legitimate messages) and false negatives (accepted spam) must be evaluated. The learning characteristics of the filter must be measured. Privacy considerations present a major challenge in constructing an archival test collection.
In this talk I will present the preliminary guidelines and evaluation methods for the 2005 TREC spam evaluation task, for which I am coordinator.
|DB meeting:||Friday, December 3, 2:00pm, DC1331|
|Title:||Bloom filters: a survey of their usage in the database management systems|
This talk is intended to be a survey on bloom filters usage in the database management systems. The talk includes topics such as integration of the cost-based placement of the bloom filters in a query optimizer, usage of the bloom filters to improve execution of the access plans, and bloom filters representation of sets of cached queries. In particular, I will talk about the usage of the bloom filters in the SQL Anywhere Studio, a product of iAnywhere Solutions.
|Speaker:||Stanley B. Zdonik, Brown University|
Stream Processing in the Aurora/Borealis System
|DB meeting:||Friday, December 10, 2:00pm, DC1331|
|Title:||Towards Decentralized Access Control Administration|
Access Control Administration is a mechanism to set permissions for particular users to access particular data. There are several reasons to decentralize this administration: dynamic changes, different contexts, sharing data, and delegation of the administration. Some existing systems allow some degree of decentralization. None seems fully adequate. This talk presents recent work towards decentralization of access control administration in which creation time policies are defined. Others have introduced the idea of distinguishing a virtual access control matrix from the actual one. This allows them to formulate and reason about access control inheritance polices, conflict resolution policies, and default policies. We build on this foundation to describe policies that dictate what initial access control configuration is to be set at the time that an object is created. Some examples of creation time policies will be discussed.