[Please remove <h1>]
The Database Seminar Series provides a forum for presentation and discussion
of interesting and current database issues. It complements our internal database
meetings by bringing in external colleagues. The talks that are scheduled for
2001-2002 are below, and more will be listed as we get confirmations. Please
send your suggestions to M. Tamer Özsu.
Unless otherwise noted, all talks will be in room DC 1304. Coffee will be
served 30 minutes before the talk.
We will try to post the presentation notes, whenever that is possible. Please
click on the presentation title to access these notes (usually in pdf format).
Database Seminar Series is supported by iAnywhere Solutions, A Sybase Company.
24 September 2001, 11:00 AM
Title: |
Large-Scale Physical Genome Mapping |
Speaker: |
Anthony J. Bonner, University
of Toronto |
Abstract: |
A major step in understanding the genome of an organism is constructing
a physical map, that is, an assignment of DNA fragments to their locations
on the genome. Building complete maps of large genomes often requires integrating
many kinds of experimental data, each with its own forms of noise, experimental
error and anomalies, and with subtle relationships between the different
forms of data. Unfortunately, the quantity and complexity of the data greatly
complicate the map-assembly process, limiting the effectiveness and flexibility
of many map-assembly algorithms.
This talk outlines the biology of physical genome mapping, the computational
problems involved, and our approach to solving them. This approach
is based on abstracting the experimental data as a graph. Intuitively,
each node in the graph represents a point on the genome, and each edge
represents evidence that two points are close together on a chromosome.
Many forms of physical mapping data can be abstracted in this way.
The result is that genome map assembly becomes largely a problem of
graph manipulation. Graph algorithms and graph visualization play key
roles in our approach to the map-assembly problem. |
Bio: |
Dr. Bonner is an Associate Professor of Computer Science at the University
of Toronto, where he has been a faculty member since 1991. He received
his M.S. and Ph.D. in Computer Science from Rutgers University in 1990
and 1991, respectively, and his B.Sc. in Mathematics and Physics from the
University of Toronto in 1977. He has held visiting positions at the University
of Pennsylvania and at INRIA-Rocquencourt. From 1977 to 1983, he worked
in signal processing and underwater acoustics at the Defence Research Establishment
Atlantic, in Nova Scotia, Canada. He is the author of over 50 publications
and presentations. |
15 October 2001, 11:00 AM
Title: |
Local Cost Estimation for Global Query
Optimization in a Multidatabase System |
Speaker: |
Qiang Zhu, The University
of Michigan - Dearborn |
Abstract: |
A multidatabase system (MDBS) integrates data from pre-existing autonomous
local databases managed by heterogeneous database management systems, such
as ORACLE, DB2 and ObjectStore, in a distributed environment. A user can
issue a global query on the MDBS to retrieve data from multiple local databases.
The user does not need to know where the data is stored and how the result
is obtained. How to process such a global query efficiently is the task
of global query optimization. There are a number of challenges for performing
global query optimization in an MDBS. Among them, a crucial one is that
some local optimization information, such as local cost models, may not
be available to the global query optimizer because of local autonomy. To
perform global query optimization, methods for estimating necessary local
cost parameters of an autonomous local database system at the global level
are required. Research on exploring such methods has attracted a number
of researchers in the database area. In this talk, we will present several
our techniques to tackle this challenge, including (1) the query sampling
method, which drives local cost models for an MDBS based on observed costs
of sample queries; (2) the qualitative approach, which extends the query
sampling method for a dynamic multidatabase environment by incorporating
a qualitative variable into a cost model; (3) the fractional analysis,
which estimates a query cost by analyzing its fractions for a gradually-changing
multidatabase environment; and (4) the probabilistic approach, which estimates
query costs based on Markov chain theory for a rapidly-changing multidatabase
environment. Our experimental results demonstrate that this set of techniques
is quite promising in solving the local query cost estimation problem for
various multidatabase environments. |
Bio: |
Dr. Qiang Zhu is an Associate Professor at The University of Michigan
- Dearborn. He received his Ph.D. in Computer Science from the University
of Waterloo in 1995. He also holds an M.Sc. degree from the McMaster University
in Canada and an M.Eng. degree from the Southeast University in China.
Dr. Zhu is a principal investigator for a number of database research projects
funded by sources including NSF and IBM at The University of Michigan.
He has over 40 research publications in various journals and conference
proceedings. Dr. Zhu has served as a program committee member and session/workshop
chair for a number of international conferences. His current research interests
include query optimization for advanced database systems, multidatabase
systems, Web-based database technology, and data mining. |
12 November 2001, 11:00 AM
Title: |
Database Applications of Role-Based Access
Control |
Speaker: |
Sylvia Osborn, University
of Western Ontario |
Abstract: |
Role-Based Access Control (RBAC) has been receiving a lot of attention
in the last ten years as an alternative to more traditional forms of access
control, namely discretionary and mandatory access control. We will introduce
RBAC models, and show how they are and could be used for relational databases.
As well, we will show some algorithms for role graphs, and describe how
they can be used to integrate security information when two databases or
systems are being integrated. |
Bio: |
Sylvia Osborn received her PhD from the University of Waterloo in 1978.
Since 1977 she has been a faculty member in the Computer Science Department
at the University of Western Ontario in London. Her research interests
are in role-based access control, object-oriented databases, and database
integration. |
3 December 2001, 11:00 AM
Title: |
Programmability in the SQL Server
DBMS |
Speaker: |
Jose Blakeley, Microsoft Corp. |
Abstract: |
The SQL Server product is broadening the set of programming languages
that can be used by database developers to write business logic in the
form of functions, procedures, triggers, and types. A key component of
this work is hosting the .NET Common Language Runtime inside the SQL Server
process. This talk will provide an overview of the relevant design decisions
in hosting the .NET Runtime and their impact on performance, scalability,
security, and robustness of the server. I will also describe the new features,
such as functions and types, exposed through the SQL language. |
Bio: |
José Blakeley joined Microsoft in 1994. He served as an architect
for the OLE DB data access interfaces during 1995-1998. He is currently
an architect in the SQL Server product working on server-side programmability
and extensibility. José has authored several conference and journal
articles and book chapters on design aspects of relational and object database
management systems, and data access. Before joining Microsoft, José was
a member of the technical staff with Texas Instruments where he developed
an object database management system. He received a Ph.D. in Computer Science
from the University of Waterloo in 1987. |
1 February 2002, 10:00 AM
Title: |
W3C XML Query WG: A Status Report |
Speaker: |
Paul Cotton, Microsoft Canada |
Abstract: |
The World Wide Web Consortium (W3C) XML Query Working Group [1] was chartered
in September 1999 to develop a query language for XML [2] documents. The
goal of the XML Query Working Group is to produce a formal data model for
XML documents with Namespaces [3] based on the XML Infoset [4] and XML
Schemas [5, 6, 7], a formal semantics for a set of query operators on that
data model, and then a query language with a concrete canonical syntax
based on the proposed operators.
The WG recently produced a revision of the XQuery specifications
[8-11] which are now aligned with the first draft of XPath 2.0 [12].
This talk will provide an update on the current status of XQuery and
XPath. This update will include current information on the status of
the XML Query Data Model, Formal Semantics, and the XQuery and XPath
languages. The talk will also include a list of the important issues
and questions still in front of the XML Query WG.
[1] http://www.w3.org/XML/Query
[2] http://www.w3.org/TR/2000/REC-xml-20001006
[3] http://www.w3.org/TR/REC-xml-names/
[4] http://www.w3.org/TR/xml-infoset/
[5] http://www.w3.org/TR/xmlschema-0/
[6] http://www.w3.org/TR/xmlschema-1/
[7] http://www.w3.org/TR/xmlschema-2/
[8] http://www.w3.org/TR/xmlquery-use-cases
[9] http://www.w3.org/TR/query-datamodel/
[10] http://www.w3.org/TR/xquery/
[11] http://www.w3.org/TR/xquery-operators/
[12] http://www.w3.org/TR/xpath20/
|
Bio: |
Paul Cotton joined Microsoft in May, 2000 and is currently Program Manager
of XML Standards. Paul telecommutes to his Redmond job from his home in
Ottawa, Canada.
Paul has been participating in the W3C XML Activity since mid-1998
when he became IBM's prime representative on the XML Linking and Infoset
Working Groups. Paul has been chairperson of the XML Query Working
Group and a member of the XML Coordination Group since September 1999.
Paul was elected to the W3C Advisory Board in June 2000 and in December
2001 was elected to the first W3C Technical Architecture Group. Paul
is also Microsoft's alternate on the XML Protocol Working Group which
is working on SOAP 1.2.
Paul graduated with a M.Math in Computer Science from the University
of Waterloo in 1974.
|
11 February 2002, 11:30 AM
(Please note the time change)
Title: |
Automatically Tuning DBMS Buffer Pools |
Speaker: |
Pat Martin, Queens
University |
Abstract: |
The tasks of configuring and tuning large database management systems
(DBMSs) have always been both complex and time-consuming. They require
knowledge of the characteristics of the system, the data, and the workload.
The increasing diversity of the data and the workloads handled by today's
systems is making manual tuning by database administrators almost impossible.
Self-managing DBMSs attempt to solve this problem by shifting responsibility
for tuning and configuration onto the system itself.
In this talk I will first present a self-tuning algorithm, called
the Dynamic Reconfiguration algorithm (DRF), for managing the buffer
pools in a DBMS and discuss the results of a set of experiments to
investigate the performance of the algorithm. I will then propose a
set of principles for a self-managing DBMSs and suggest directions
for moving towards the goal of self-management.
|
Bio: |
Patrick Martin is a Professor in the Department of Computing and Information
Science at Queen's University and a Visiting Scientist with the Centre
for Advanced Studies at the IBM Toronto Laboratory. His research interests
include self-managing database systems and web caching. |
11 March 2002, 11:00 AM
Title: |
Timber: A Native XML Database Management
System |
Speaker: |
H.V. Jagadish, University
of Michigan |
Abstract: |
The eXtensible Markup Language (XML) has recently become very popular
as a representation format for a wide variety of data. Large repositories
of XML data have begun to emerge. The effective management of XML in a
database thus becomes a pressing issue. A central challenge in this regard
is the complex and heterogeneous structure of XML data. In this talk, I
will discuss the design of Timber, a native XML database management system
that we are building at the University of Michigan. |
Bio: |
H. V. Jagadish is a Professor of Computer Science and Engineering at
the University of Michigan in Ann Arbor. After earning his PhD from Stanford
in 1985, he spent over a decade at AT&T Bell Laboratories in Murray Hill,
N.J., eventually becoming head of AT&T Labs database research department
at the Shannon Laboratory in Florham Park, N.J. He has also served as a
Professor at the University of Illinois in Urbana-Champaign.
Professor Jagadish is well-known for his broad-ranging research on
databases, and has over 80 major papers and 20 patents. He is currently
the founding editor of the ACM SIGMOD Digital Review. Among many professional
positions he has held, he has previously been an Associate Editor for
the ACM Transactions on Database Systems (1992-1995) and Program Chair
of the ACM SIGMOD annual conference (1996). |
8 April 2002, 11:00 AM
Title: |
Clustering by Impact: Scalable, Incremental
Clustering of Data Streams |
Speaker: |
Daniel Barbara, George Mason
University |
Abstract: |
As organizations tend to store and analyze more and more data, the need
to design and implement data mining algorithms that are incremental and
can deal with continuous data streams is pressing. In this talk we present
two clustering algorithms that meet that need. They cluster new points
by measuring the impact these points have on existing cluster's properties.
In particular, Fractal Clustering deals with numerical domains and uses
the impact over the fractal dimension of the existing clusters to decide
where to place a new point. Its counterpart, Entropy Clustering, is designed
to deal with categorical domains and it uses the impact over the entropy
of the existing clusters to perform clustering. We will show results of
using these techniques over real and synthetic data sets, as well as show
the techniques' relationship to the well-known principle of Minimum Description
Length. We will also show how these techniques are well-suited to track
the evolution of clusters in data streams, due both to the concise representations
of the clustering models they produce, and incremental nature of the algorithms. |
Bio: |
Daniel Barbara is an Associate Professor in the Information and Software
Engineering Department, George Mason University. He got his Ph.D. in Computer
Science at Princeton University in 1985, and has held positions in Panasonic
Laboratories and Bell Communication Research as a Senior Scientist. He
is the author of over 60 publications in international conferences and
research journals. Dr. Barbara's current interest are in the areas of Data
Warehousing and Data Mining. |
29 April 2002, 11:00 AM
Title: |
Generic Model Management - A Database
Infrastructure for Schema Manipulation |
Speaker: |
Philip Bernstein,
Microsoft Research |
Abstract: |
Despite 30 years of research on database support for engineering applications,
such applications remain complicated and hard to build. To improve this
situation by an order of magnitude, a much higher level API is needed.
We present such an interface, called Model Management. Its objects are
models and mappings. By "model," we mean a complex structure
that represents a design artifact, such as a relational schema, XML schema,
object-oriented interface, UML model, web-site map, or software configuration.
By "mapping," we mean an explicit representation of connections
or transformations between two models. The main operations of Model Management
are match, merge, diff, and compose. We explain how these operations can
be used to solve classical meta data management problems and sketch a system
architecture to implement them. |
Bio: |
Phil Bernstein is a researcher at Microsoft Corporation. Over the past
25 years, he has been a product architect at Microsoft and at Digital Equipment
Corp., a professor at Harvard University and Wang Institute of Graduate
Studies, and a VP Software at Sequoia Systems. During that time, he has
published over 100 articles on the theory and implementation of database
systems, and coauthored three books, the latest of which is Principles
of Transaction Processing for the System Professional (Morgan Kaufmann,
1997). He holds a B.S. from Cornell University and a Ph.D. from University
of Toronto. A summary of his current research on meta data management can
be found at http://www.research.microsoft.com/~philbe. |
10 June 2002, 11:00 AM
Title: |
Location Management in Moving Objects Databases |
Speaker: |
Ouri Wolfson,
University of Illinois at Chicago |
Abstract: |
Consider applications of a database that models information about moving
objects and their location. For example, given a database representing
the location of taxi-cabs, a typical query may be: retrieve the free cabs
that are currently within 1 mile of a customer at 33 N. Michigan Ave..
Military applications utilizing moving objects databases arise in the context
of the digital battlefield, and civilian ones arise in transportation systems
and in systems that track mobile computers for providing context awareness.
Currently, moving objects database applications are being developed
in an ad hoc fashion. Database Management System (DBMS) technology
provides a potential foundation upon which to build these applications,
however, there is a critical set of needed capabilities that are lacking
in existing DBMS's. These include support for continuously changing
data, for integrated spatial and temporal information, and for uncertainty
management. The objective of our DOMINO project is to build an envelope
containing these capabilities on top of existing DBMS's. In this talk
I will describe the problems addressed by the project, and our proposed
solutions.
|
Bio: |
Ouri Wolfson's main research interests are in database systems, distributed
systems, transaction processing, and mobile computing. He received his
B.A. degree in mathematics, and his Ph.D. degree in computer science from
Courant Institute of Mathematical Sciences, New York University. He is
currently a Professor in the Department of Computer Science at the University
of Illinois at Chicago, where he directs the Databases and Mobile Computing
Laboratory, and the newly established Mobile Information Systems Research
Center. He served as a consultant to Argonne National Laboratory, to the
US Army Research Laboratories, and to the Center of Excellence in Space
Data and Information Sciences at NASA. He is also the founder of Mobitrac,
a high-tech startup company specializing in infrastructure software for
location based services and products. Before joining the University of
Illinois he has been on the computer science faculty at the Technion and
Columbia University, and he has been a Member of Technical Staff at Bell
Laboratories. Ouri Wolfson authored over ninety publications in leading
journals and conference proceedings. He is a Fellow of the Association
of Computing Machinery, an editor of the ACM/URSI/Baltzer Wireless Networks
Journal, a Member of the ACM SIGMOD Digital Review Editorial Board and
a guest editor of the ACM/Baltzer Journal on Special Topics in Mobile Networks.
He is also the 2001 recipient of the UIC College of Engineering Faculty
Research Award. He is currently serving as a National Lecturer for the
Association of Computing Machinery professional society. He participated
in numerous conferences (including ACM-SIGMOD, VLDB, PODS, ICDE, NGITS,
ICDCS, MOBIDATA, DOOD, SSD, GIS, PDIS, CIKM) as a program committee member,
keynote speaker, session chairman, and panelist. Most recently he was the
keynote speaker at the Second International Conference on Mobile Data Management
(MDM '2001), and is the program committee co-chair of the Third International
Conference on Mobile Data Management (MDM 2002). He was also the General
co-Chairman of the IEEE Knowledge and Data Engineering Exchange Workshop,
and he serves on the Advisory Committee of the NSF Center of Research Excellence
in Science and Technology, at Florida A&M University. His work has been
funded by the National Science Foundation, Air Force Office of Scientific
Research, Defense Advanced Research Projects Agency, NATO, US Army, the
New York State Science and Technology Foundation, Hughes Research Laboratories,
and Informix Co. |
8 July 2002, 11:00 AM
Title: |
Caterpillars, T-Graphs and Context |
Speaker: |
Derick Wood, Hong Kong
University of Science and Technology |
Abstract: |
I will present two different ways that we developed for specifying
context in documents. The first, caterpillar expressions, leads to
very nice theoretical questions/problems; the second, T-graphs, supports
a 70% solution that is suitable for most contexts that we need (it
was used in Designer).
I will compare the two methods and summarize their positive and
negative aspects. |
Bio: |
Professor Wood received his BSc and PhD degrees from the University
of Leeds, England, in 1963 and 1968, respectively. He was a Postdoctoral
Fellow at the Courant Institute, New York University, from 1968 to
1970 before joining the Unit of Computer Science at McMaster University
in 1970. He was Chair of Computer Science from 1979 to 1982. From 1982
to 1992 he was a Professor in the Department of Computer Science, University
of Waterloo.
For three years he served as Director of the Data Structuring
Group. Before joining HKUST in 1995, he was a Professor in the
Department of Computer Science, University of Western Ontario.
He has published widely in a number of research areas and has written
two textbooks, "Theory of Computation," published by John Wiley,
and "Data Structures, Algorithms, and Performance," published
by Addison-Wesley. |
2 August 2002, 11:00 AM
Title: |
Robust Space Transformations for
Distance-based Operations |
Speaker: |
Raymond
Ng, University of British Columbia |
Abstract: |
For many database operations, such as nearest neighbor search,
distance-based clustering and outlier detection, there is an underlying
$k$-D data space in which each tuple/object is represented as a point
in the space. We observe that in the presence of variability, correlation,
outliers and/or differing scales, we may get unintuitive results
if an inappropriate space is used.
The fundamental question addressed in this paper is: ``What
then is an appropriate space?''. We propose using a robust
space transformation called the Donoho-Stahel estimator.
In the first half of the paper, we show the key properties
of the estimator. Of particular importance to database applications
is the stability property, which says that in spite
of frequent updates, the estimator does not (a) change much,
(b) lose its usefulness, or (c) require re-computation. In
the second half, we focus on the computation of the estimator
for high-dimensional databases. We develop randomized algorithms
and evaluate how well they perform empirically. The bottom-line
is that the Donoho-Stahel transformation, which possesses desirable
properties for database operations, can be computed efficiently
as well. |
Bio: |
Raymond Ng received his B.Sc.(Hons) degree in Computer Science
from the University of British Columbia in 1984, and his M.Math.
degree in Computer Science from the University of Waterloo in 1986,
and his Ph.D. degree in Computer Science from the University of Maryland,
College Park, in 1992. He is a full professor at the University of
British Columbia. His areas of research include data mining, bioinformatics,
image databases, and multimedia systems. He has published numerous
conference and journal papers on these topics. He is one of the associate
editors for the VLDB journal and the IEEE TKDE journal. He is a program
co-chair of the 2002 SIGKDD Conference, and has served as a member
of program committees for many premier conferences, including ACM
SIGMOD, VLDB, SIGKDD and ACM PODS. |