[Please remove <h1>]
The Database Seminar Series provides a forum for presentation and discussion
of interesting and current database issues. It complements our internal database
meetings by bringing in external colleagues. The talks that are scheduled for
2002-2003 are below, and more will be listed as we get confirmations. Please
send your suggestions to M. Tamer Özsu.
Unless otherwise noted, all talks will be in room DC 1304. Coffee will be
served 30 minutes before the talk.
We will try to post the presentation notes, whenever that is possible. Please
click on the presentation title to access these notes (usually in pdf format).
Database Seminar Series is supported by iAnywhere Solutions, A Sybase Company.
23 September 2002, 11:00 AM
Title: |
SINGAPORE:
Towards flexible querying of heterogeneous data sources |
Speaker: |
Klaus
R. Dittrich, University of Zurich |
Abstract: |
Data
available on-line today is spread across heterogeneous data sources like
traditional databases or repositories of various forms containing unstructured
and semistructured data. Obviously, the "technical'' availability
alone is not at all sufficient for making meaningful use of existing
information, and thus the problem of effectively and efficiently accessing
and querying heterogeneous data is an important research issue. One popular
approach is to integrate the data sources and offer users an a priori
defined global schema. Alternatively, there are approaches which implement
tools for giving users the possibility to define the query schema themselves.
We propose a new approach where heterogeneous sources can be queried
through a unified interface and underlying sources are integrated by
means of a query language only. We present extensions to OQL which allow
to query structurally heterogeneous, i.e. structured, semistructured
and unstructured data alike, and to integrate data on the fly. We also
present some details of query preprocessing and show how techniques from
database and information retrieval systems can be combined. |
Bio: |
Prof. Klaus Dittrich received his diploma degree (M.Sc.)
in Computer Science from the University
of Karlsruhe. He earned his Ph.D. in 1982 at IPD
Institute for Program Structures and Data Organization. 1984 he spent
a year as a post-doctoral fellow at IBM
Almaden Research Center. He was head of the database department at FZI
Research Center for Information Technologies at University of Karlsruhe
from 1985 to 1989.
Since 1989 he has been a Professor of Computer
Science at the University of Zurich and head of the Database
Technology Research Group.
He took a sabbatical leave at Stanford
University, USA and Hewlett
Packard Labs, USA (1996) and was guest professor at Aalborg
University, Denmark (1999).
He is a member of
and the current president of SI (Swiss
Informaticians Society) and former president of IPEG (interuniversitäre
Partnerschaft für Erdbeobachtung und Geoinformatik). He is also
the secretary of the VLDB
Endowment (Very Large Data Base Endowment Inc.). Until 1997 he was
a member of the SIGMOD
Advisory Committee.
Prof. Klaus Dittrich has been nominated as a distinguished speaker of
the IEEE
Europe Distinguished Visitor Program. |
4 October 2002, 2:00 PM (Note the special time.)
Title: |
Profile Driven Data Management for Pervasive
Environments |
Speaker: |
Yelena Yesha,
University Maryland at Baltimore County |
Abstract: |
The past few years have seen significant work in mobile data management,
typically based on the client/proxy/server model. Mobile/wireless devices
are treated as clients that are data consumers only, while data sources
are on servers that typically reside on the wired network. With the advent
of "pervasive computing" environments an alternative scenario
arises where mobile devices gather and exchange data from not just wired
sources, but also from their ethereal environment and one another. This
is accomplished using ad-hoc connectivity engendered by Bluetooth like
systems. In this new scenario, mobile devices become both data consumers
and producers. We describe the new data management challenges which this
scenario introduces. We describe the design and present an implementation
prototype of our framework, MoGATU, which addresses these challenges. An
important component of our approach is to treat each device as an autonomous
entity with its "goals" and "beliefs", expressed using
a semantically rich language. We have implemented this framework over a
combined Bluetooth and Ad-Hoc 802.11 network with clients running on a
variety of mobile devices. We present experimental results validating our
approach and measure system performance. |
Bio: |
Yelena Yesha received the B.Sc. degree in Computer Science from York
University, Toronto, Canada in 1984, and the M.Sc. and Ph.D degrees in
Computer and Information Science from The Ohio State University in 1986
and 1989, respectively.
Since 1989 she has been with the Department of Computer Science and
Electrical Engineering at the University of Maryland Baltimore County,
where she is presently a Verizon Professor. In addition, from December,
1994 through August, 1999 Dr. Yesha served as the Director of the Center
of Excellence in Space Data and Information Sciences at NASA. Her research
interests are in the areas of distributed databases, distributed systems,
mobile computing, digital libraries, electronic commerce, and trusted
information systems. She published 8 books and over 100 refereed articles
in these areas. Dr. Yesha was a program chair and general co-chair
of the ACM International Conference on Information and Knowledge Management
and a member of the program committees of many prestigious conferences.
She is a member of the editorial board of the Very Large Databases
Journal, and the IEEE Transaction on Knowledge and Data Engineering,
and is editor-in-chief of the International Journal of Digital Libraries.
During 1994, Dr. Yesha was the Director of the Center for Applied
Information Technology at the National Institute of Standards and Technology.
Dr. Yesha is a senior member of IEEE, and a member of the ACM. |
21 October 2002, 11:00 AM
Title: |
Bridging Relational Technology and XML |
Speaker: |
Jayavel Shanmugasundaram,
Cornell University |
Abstract: |
XML has emerged as the standard data-exchange format for Internet-based
business applications. These applications introduce a new set of data management
requirements involving XML. However, for the foreseeable future, a significant
amount of business data will continue to be stored in relational database
systems. Thus, a bridge is needed to satisfy the requirements of these
new XML-based applications while still leveraging relational database technology.
In this talk, we shall describe the design and implementation of a middleware
system that we believe achieves this goal. In particular, we shall describe
a general framework for creating XML views of relational data, querying
XML views, and storing and querying XML documents using a relational database
system. Some of the interesting features of the system architecture are
that it (a) provides users with a single XML query language for creating
and querying XML views of relational data, (b) it evaluates queries efficiently!
by pushing most computation down to the relational database engine, (c)
it allows users to query seamlessly over relational data and meta-data,
and (d) it allows users to write queries that span XML documents and XML
views of relational data. |
Bio: |
Jayavel Shanmugasundaram is an Assistant Professor in the Department
of Computer Science at Cornell University. He received his Ph.D. degree
from the University of Wisconsin at Madison, a masters degree from the
University of Massachusetts at Amherst, and a bachelors degree from the
Regional Engineering College at Tiruchirappalli, India. Shanmugasundaram's
research interests include Internet data management, database systems and
query-processing in emerging system architectures. He is the author of
several publications and patents, and his research ideas have been implemented
in commercial data management products. |
4 November 2002, 11:00 AM
Title: |
Mining Knowledge about Changes,
Differences, and Trends |
Speaker: |
Guozhu
Dong, Wright State University |
Abstract: |
Knowledge about changes, differences, and trends is very
useful. For example, companies wish to identify important temporal changes
and trends in customer purchase behavior, so that they can adjust their
business priorities. Medical researchers wish to identify differences in
gene group interactions between normal cell tissues and cancer cell tissues,
so that they can discover better treatment to cancer.
We discuss some recent results on mining such knowledge. We are concerned
with transactional data, relational data, and data cubes. We consider
emerging patterns that capture differences and changes between a dataset
pair, gradient patterns in a data cube that capture similar cells with
big differences in measure values, and multidimensional multi-level
trends in sets of time series in a data cube context. We discuss mining
algorithms and ways to use the patterns. |
Bio: |
Guozhu Dong is an associate professor at Wright State University,
USA. He received his PhD from the University of Southern California in
1988. He previously taught at the University of Melbourne and the Flinders
University, both in Australia, and consulted for Lucent Bell Labs and LIT
Singapore. His main research interests are in the areas of databases, data
mining, and bioinformatics. He has published over 80 articles in these
areas. He has served on numerous program committees, including ICDE, ICDM,
ICDT, PODS, SIGKDD, and VLDB. He is a program co-chair of the International
Conference on Web-Age Information Management (2003), and is on the editorial
board of International Journal of Information Technology. |
2 December 2002, 11:00 AM
Title: |
FLORA-2: Programming with Logic and Objects |
Speaker: |
Michael Kifer, SUNY at
Stony Brook |
Abstract: |
This talk is about a marriage of object-based and logic-based paradigms
for programming knowledge-intensive applications.
The product of this marriage is FLORA-2, which is both a seamless
integration of Frame Logic, HiLog and Transaction Logic in a single
formalism, and an implementation that adds important pragmatic extensions.
Together they make a powerful knowledge programming language.
Frame Logic relates to the object-oriented data model as classical
predicate calculus relates to the relational data model. HiLog adds
meta-programming, and Transaction Logic add dynamics to the mix.
Although FLORA-2 has been released only in its alpha form, it is already
very usable and has a following of dedicated users in the areas of
information integration, semantic web, information systems design,
agent building, etc. |
Bio: |
Michael Kifer is a Professor with the Department of Computer Science,
State University of New York at Stony Brook (USA). He received his Ph.D.
in Computer Science in 1985 from the Hebrew University of Jerusalem, Israel,
and the M.S. degree in Mathematics in 1976 from Moscow University, Russia.
Dr. Kifer's interests include database systems, knowledge representation,
and Web information systems. He has published two text books and numerous
articles in these areas. In 1999 and 2002 he was a recipient of the
ACM-SIGMOD "Test of Time" awards for his works on object-oriented
database languages. |
21 January 2003, 1:00 PM
Title: |
Practical Considerations for Semantic
Cache Management |
Speaker: |
Björn Þór
Jónsson, Reykjavik University |
Abstract: |
The emergence of query-based on-line data services and e-commerce
applications has prompted much recent research on data caching. This
talk describes semantic caching, a caching arcitecture for such applications,
that caches the results of selection queries. Unlike most previous
approaches to caching query results, data is not replicated in the
semantic cache, thus improving the utility of the cache. Furthermore,
partial results are re-used, reducing network traffic. The focus
of the talk is on two performance studies using a prototype implementation
that connects to a commercial relational server. One study focuses
on relatively simple selection workloads and demonstrates several
intrinsic benefits of semantic caching, including low overhead, insensitivity
to the physical layout of the database, reduced network traffic,
the ability to answer some queries without contacting the server,
and the ability to incorporate application knowledge in replacement
decisions. The second performance study focuses on complex selection
workloads. It demonstrates that, despite the increased complexity
of cache management, semantic caching works well in a wide range
of network-constrained environments.
|
Bio: |
Dr. Björn Þór Jónsson is an associate
professor in the School of Computer Science at Reykjavík University,
Iceland. His research focuses on database caching architectures and
multimedia database systems, in particular image and text databases.
He has taught classes on database theory and application, database
tuning and advanced database systems. Björn received his Ph.D.
degree in Computer Science from the University of Maryland, College
Park in 1999. The subject of his thesis was "Application-Oriented
Buffering and Caching Techniques". |
14 March 2003, 2:00 PM
Title: |
TelegraphCQ:
Continuous Dataflow Processing for an Uncertain World |
Speaker: |
Michael
Franklin, University of California, Berkeley |
Abstract: |
Increasingly pervasive networks are leading towards
a world where data is constantly in motion. In such a world, conventional
techniques for query processing, which were developed under the assumption
of a far more static and predictable computational environment, will
not be sufficient. In response to this need, the Telegraph project
at Berkeley has developed a suite of novel technologies for continuously
adaptive query processing. We are currently building the next generation
Telegraph system, called TelegraphCQ, which is focused on meeting
the challenges that arise in handling large numbers of continuous
queries over high-volume, highly-variable data streams. In this talk,
I will describe the TelegraphCQ system architecture and its underlying
technology, and report on our ongoing implementation effort leveraging
the PostgreSQL open source code base. I will also discuss our overall
research agenda, including related projects on high-volume XML filtering
and query processing in ad hoc sensor networks. |
Bio: |
Michael Franklin is an Associate Professor of Computer
Science at the University of California, Berkeley. His research focuses
on the architecture and performance of distributed databases and
information systems. He received his Ph.D. from the University of
Wisconsin, Madison in 1993. Previously, he was on the faculty at
the University of Maryland, College Park, where he led projects on
adaptive query processing and data dissemination. He served as Program
Chair for the 2002 ACM SIGMOD Conference and is currently an Editor
of ACM Transactions on Database Systems, Vice Chair of the SIGMOD
Advisory Board, and a member of the Board of Trustees of the VLDB
Endowment. He is also a technology advisor to the Mayfield Fund and
sits on the technology advisory boards of several companies. |
14 April 2003, 11:00 AM
Title: |
Hidden-Web Databases: Classification and
Search |
Speaker: |
Luis Gravano, Columbia
University |
Abstract: |
Many valuable text databases on the web have non-crawlable
contents that are "hidden" behind search interfaces. Hence
traditional search engines do not index this valuable information. One
way to facilitate access to "hidden-web" databases is through
commercial Yahoo!-like directories, which organize these databases manually
into categories that users can browse. In this talk, I will describe
a technique to automate the classification of hidden-web databases. Our
technique adaptively probes the databases with queries derived from document
classifiers, without retrieving any documents. A large-scale experimental
evaluation over 130 real web databases indicates that our technique produces
highly accurate database classification results using -on average- fewer
than 200 queries of four words or less to classify a database.
An alternative way to facilitate access to hidden-web databases is through "metasearchers," which
provide a unified query interface to search many databases at once. For
efficiency, a critical task for a metasearcher is the selection of the
most promising databases to search for a query, a task that typically relies
on statistical summaries of the database contents. In this talk,
I will also describe a recent technique to derive content summaries from
hidden-web databases. We exploit our probing-based classification algorithm
to adaptively zoom in on and extract documents that are representative
of the topic coverage of the databases. We can then build content summaries
from these topically-focused document samples. A large-scale experimental
evaluation over a variety of databases indicates that our new content-summary
construction technique is efficient and produces more accurate summaries
than those
from previously proposed strategies. |
Bio: |
Luis Gravano has been on the faculty of the Computer Science
Department, Columbia University since September 1997, where he has been
an associate professor since July 2002. From January through August 2001,
Luis was a Senior Research Scientist at Google (while on leave from Columbia
University). He received his Ph.D. degree in Computer Science from Stanford
University in 1997. He also received an M.S. degree from Stanford University
in 1994 and a B.S. degree from the Escuela Superior Latinoamericana de
Informatica (ESLAI), Argentina in 1990. Luis is an associate editor of
the ACM Transactions on Information Systems, as well as database program
chair for the upcoming ACM CIKM 2004. Luis is also a recipient of a CAREER
award from the National Science Foundation.
__
This talk describes work performed jointly with Panos Ipeirotis
(Columbia) and Mehran Sahami (Stanford/Google). |
12 May 2003, 11:00 AM
Title: |
Bioinformatics:
Gene Expression Data Analysis |
Speaker: |
Aidong
Zhang, University at Buffalo |
Abstract: |
DNA microarray technology provides a broad
snapshot of the state of the cell by measuring the expression levels of
thousands of genes simultaneously. It has already had a significant impact
on the field of bioinformatics and has proposed an unique challenge: information
in gene expression matrices is special in that the sample space and gene
space are of very different dimensionality and it can be studied in either
sample space or gene space. While most of the previous studies focus on
clustering either genes or samples, it is interesting to ask whether we
can partition the complete set of samples into exclusive groups (called
phenotypes) and find a set of informative genes that can manifest the phenotypes.
The mining of phenotypes and informative genes can provide valuable information
to the biologists to understand the roles of genes and the phenotype structure
of samples. In this talk, I will describe new techniques which simultaneously
mine phenotypes and informative genes from gene expression data. These
techniques integrate statistics, data mining, and machine learning methods
in an unique fashion to achieve optimal solutions. |
Bio: |
Aidong Zhang is a Professor in the Department
of Computer Science and Engineering at State University of New York at
Buffalo. She received her Ph.D degree in computer science from Purdue University,
West Lafayette, Indiana, in 1994. Her research interests include bioinformatics,
multimedia systems, content-based image retrieval, geographical information
systems, and data mining. She serves on the editorial boards of ACM Multimedia
Systems, the International Journal of Multimedia Tools and Applications,
International Journal of Distributed and Parallel Databases, and ACM SIGMOD
DiSC (Digital Symposium Collection).
She was co-chair of the technical program committee for ACM Multimedia
2001. Dr. Zhang is a recipient of the National Science Foundation CAREER
award and SUNY Chancellor's Research Recognition award. |
7 July 2003, 11:00 AM
Title: |
Database Support for Data Mining Applications |
Speaker: |
Wolfgang Lehner,
Technische Universität Dresden |
Abstract: |
Database support for data mining has become an important research
topic. Especially for large high-dimensional data volumes, comprehensive
support from the database side is necessary. In this talk I will
focus on the data intensive subproblem of aggregating high-dimensional
data in all possible low-dimensional projections (for instance estimating
low-dimensional histograms), which occurs in several established
data mining techniques. I will argue that existing OLAP SQL-extensions
are insufficient for high-dimensional data and propose a new SQL-operator,
which seamlessly fits into the set of existing OLAP group-by operators.The
new SQL operator is presented from a SQL language as well as from
an implementational point of view. Different methods implementing
the operator will be outlined and discussed in the context of the
prototypical implementation within the Postgres database engine.
Performance studies show that the operator yields a large speedup
(up to factor 10) over existing methods provided by commercially
available database systems. |
Bio: |
Please see http://wwwdb.inf.tu-dresden.de |
31 July 2003, 11:00 AM; DC1302
Title: |
Mining the Web: Search Engines |
Speaker: |
Ricardo Baeza-Yates,
University of Chile |
Abstract: |
The Web grows and evolves faster than we like and expect, imposing
scalability and relevance problems to Web search engines. In this
talk we present how mining Web data and usage logs allows to improve
a search engine in several ways: page ranking, indices and interfaces.
As a corollary we show several interesting relations of different
Web characteristics: structure, dynamics, "quality", etc. Our results
help to understand not only technical issues, but also social ones,
as the Web is the collaborative work of many people, a few publishing,
and all of them querying. |
Bio: |
Ricardo Baeza-Yates obtained a Ph.D. in CS at U. of Waterloo, Canada,
in 1989. In 1992 he was elected president of the Chilean Computer
Science Society (SCCC) until 1995, being elected again in for 1997-98.
During 1993, he received the Organization of American States award
for young researchers in exact sciences. In 1994 he received the
award to the best engineering research in the last 4 years from the
Institute of Engineers of Chile. In 1997 with two Brazilian colleagues
obtained the COMPAQ prize to the best Brazilian research article
in CS. He was recently elected to the IEEE CS Board of Governors
for the period 2002-04. In 2002 he was appointed to the Chilean Academy
of Sciences, being the first person from computer science to achieve
this position in Chile. Currently he is a professor at the CS department
of the University of Chile, where he was the chair in the period
1993-95. He is also director of the Center for Web Research, a project
funded by the Millenium Scientific Initiative. His research interests
include information retrieval, algorithms, and information visualization.
He is co-author of the book Modern Information Retrieval, published
in 1999 by Addison-Wesley, as well as co-author of the 2nd edition
of the Handbook of Algorithms and Data Structures, Addison-Wesley,
1991; and co-editor of Information Retrieval: Algorithms and Data
Structures, Prentice-Hall, 1992. |