[Please remove <h1>]
The Database Seminar Series provides a forum for presentation and discussion
of interesting and current database issues. It complements our internal database
meetings by bringing in external colleagues. The talks that are scheduled for
2005-2006 are below, and more will be listed as we get confirmations. Please
send your suggestions to M. Tamer Özsu.
Unless otherwise noted, all talks will be in room DC 1304. Coffee will be
served 30 minutes before the talk.
We will try to post the presentation notes, whenever that is possible. Please
click on the presentation title to access these notes (usually in pdf format).
Database Seminar Series is supported by iAnywhere Solutions, A Sybase Company.
19 September 2005, 11:00
AM
Title: | Building a MetaQuerier and Beyond: A Trilogy of Search,
Integration, and Mining for Web Information Access |
Speaker: | Kevin Chang, University
of Illinois, Urbana-Champaign |
Abstract: | While the Web has become the ultimate information repository, several
major barriers have hindered today's search engines from unleashing the
Web's promise. Toward tackling the dual challenges for accessing both
the deep and the surface Web, I will present our "trilogy" of
pursuit:
To begin with, from search to integration: As the Web has deepened
dramatically, much information is now hidden on the "deep Web," behind
the query interfaces of numerous searchable databases. Our 2004 survey
estimated 450,000 online databases and 1,258,000 query interfaces.
We thus believe that search much resort to integration: To enable access
to the deep Web, we are building the MetaQuerier at UIUC for both finding
and querying such online databases. Further, from integration to mining: Toward large scale integration,
to tackle the critical issue of dynamic semantics discovery, we observe
our key insight that-- while the deep Web challenges us for its large
scale, the challenge itself presents a unique opportunity: We believe
that integration must resort to mining, to tackle the deep semantics
by exploring shallow syntactic and statistic regularities hidden across
large scale of sources, holistically. Finally, from mining back to search? Beyond the MetaQuerier, such
holistic mining is equally crucial for the dual challenge of semantics
discovery on the surface Web. We believe such mining must resort to
search, and propose to build holistic analysis into a next generation
search engine by demonstrating our initial solutions. Project URL: http://metaquerier.cs.uiuc.edu |
Bio: | Kevin Chen-Chuan Chang is an Assistant Professor in the Department of
Computer Science, University of Illinois at Urbana-Champaign. He received
a PhD in Electrical Engineering in 2001 from Stanford
University. His research interests are in large scale information access,
with emphasis on Web information integration and top-k ranked query processing.
He is the recipient of an NSF CAREER Award
in 2002, an NCSA Faculty Fellow Award in 2003, and IBM Faculty Awards in
2004 and 2005. URL: http://www-faculty.cs.uiuc.edu/~kcchang/ |
4 October 2005, 11:00 AM; MC 5136 (Please
note special place.)
Title: | SMaestro: Second Generation Storage Infrastructure Management |
Speaker: | Kaladhar Voruganti,
IBM Almaden Research Center |
Abstract: | Storage management has now become the largest component of the overall
cost of owning storage subsystems. One of the key reasons for the high
value of this cost is due to the limit on the amount of storage that can
be managed by a single system administrator. This limit is due to the set
of complex storage management tasks that a system administrator has to
perform such as storage provisioning, performance bottleneck evaluation,
planning for future growth, backup/restore, security violation analysis,
and interaction with application, network and database system administrators.
Thus, many storage vendors have introduced storage management tools to
try and increase the amount of storage that can be managed by a single
system administrator by trying to automate many of these tasks. However,
most of these existing storage management products can generally be classified
as first generation products that provide basic monitoring and workflow
based action support. These tools generally lack analysis and planning
functionality. The objective of this talk is to present the trends in the
planning and analysis area of storage management with specific emphasis
on open research problems. |
Bio: | Kaladhar Voruganti received his BSc in Computer Engineering and PhD in
Computing Science from the University of Alberta in Canada. For the past
6 six years he has been working as a research staff member at the IBM Almaden
Research lab in San Jose, California. He is currently leading an multi-site
research team that is working on storage management planning tools. Kaladhar
has received an Outstanding Technical Achievement award for his contributions
to IBM iSCSI storage controller, and another Outstanding Technical achievement
award for his contributions to IBM storage management products. IBM iSCSI
target controller has received the most innovative product award at Storage
2001 and Interop 2001 conferences. In the past Kaladhar has published in
leading database conferences. Currently he is actively publishing in leading
storage systems conferences and has received three IBM Bravo awards for
his publication efforts. |
17 October 2005, 11:00 AM
Title: | Learning in Query Optimization |
Speaker: | Volker Markl,
IBM Almaden Research Center |
Abstract: | Database Systems let users specify queries in a declarative language
like SQL. Most modern DBMS optimizers rely upon a cost model to choose
the best query execution plan (QEP) for any given query. Cost estimates
are heavily dependent upon the optimizer's estimates for the number of
rows that will result at each step of the QEP for complex queries involving
many predicates and/or operations. These estimates, in turn, rely upon
statistics on the database and modeling assumptionsthat may or may not
be true for a given database. In the first part of our talk, we present
research on learning in query optimization that has been carried out at
the IBM Almaden Research Center. We introduce LEO, DB2's LEarning Optimizer,
as a comprehensive way to repair incorrect statistics and cardinality estimates
of a query execution plan. By monitoring executed queries, LEO compares
the optimizer's estimates with actuals at each step in a QEP, and computes
adjustments to cost estimates and statistics that may be used during the
current and future query optimizations. LEO introduces a feedback loop
to query optimization that enhances the available information on the database
where the most queries have occurred, allowing the optimizer to actually
learn from its past mistakes.
In the second part of the talk, we describe how the knowledge gleaned
by LEO is exploited consistently in a query optimizer, by adjusting
the optimizer's model and by maximzing information entropy. |
Bio: | Dr. Markl has been working
at IBM's Almaden Research Center in San Jose,USA since 2001, conducting
research in query optimization, indexing, and self-managing databases.
Volker Markl is spearheading the LEO project, an effort on autonomic
computing with the goal to create a self-tuning optimizer for DB2 UDB.
He also is the Almaden chair for the IBM Data Management Professional
Interest Community (PIC). From January 1997 to December 2000, Dr. Markl worked for the Bavarian
Research Center for Knowledge-Based Systems (FORWISS) in Munich, Germany
as deputy research group manager, leading the MISTRAL and MDA projects,
thereby cooperating with SAP AG, NEC, Hitachi, Teijin Systems Technology,
GfK, and Microsoft Research. His MDA project, jointly with TransAction
Software, developed the relational database management system TransBase
HyperCube, which was awarded the European IST Prize 2001 by EUROCASE
and the European Commission. Dr. Markl also initiated and co-ordinated the EDITH EU IST project
investigating the physical clustering of multiple hierarchies and its
applications to GIS and Data Warehousing that now is being carried
out by FORWISS and several partners from Germany, Italy, Greece, and
Poland. Volker Markl is a graduate of the Technische Universität München,
where he earned a Masters degree in Computer Science in 1995. He completed
his PhD in 1999 under the supervision of Rudolf Bayer. His dissertation
on "Relational Query Processing Using a Multidimensional Access
Technique" was honored "with distinction" by the German
Computer Society (Gesellschaft für Informatik). He also earned
a degree in Business Administration from the University Hagen, Germany
in 1995. Since 1996, Volker Markl has published more than 30 reviewed
papers at prestigious scientific conferences and journals, filed more
than 10 patents and has been invited speaker at many universities and
companies. Dr. Markl is member of the German Computer Society (GI)
as well as the Special Interest Group on Management of Data of the
Assosication for Computing Machinery (ACM SIGMOD). He also serves as
program committee member and reviewer for several international conferences
and journals, including SIGMOD, ICDE, VLDB, TKDE, TODS, IS, and the
Computer Journal. His main research interests are on autonomic computing,
query processing, and query optimization, but also include applications
like data warehousing, electronic commerce and pervasive computing. Dr. Markl's earlier professional experience include software engineer
for a virology laboratory, as part of his military service; lecturer
for software-engineering courses at the University of Applied Sciences
in Augsburg, Germany and for programming and communications at the
Technische Universität München; and consultant for a forwarding
agency. He was awarded a fellowship by Siemens AG, Munich and also
worked as an international intern with Benefit Panel Services, Los
Angeles. |
24 October 2005, 11:00 AM
Title: | Approximate Joins: Concepts and Techniques |
Speaker: | Divesh Srivastava ,
AT&T Labs-Research |
Abstract: | The quality of the data residing in information repositories and databases
gets degraded due to a multitude of reasons. In the presence of data
quality errors, a central problem is to identify all pairs of entities
(tuples) in two sets of entities that are approximately the same. This
operation has been studied through the years and it is known under various
names, including record linkage, entity identification, entity reconciliation
and approximate join, to name a few. The objective of this talk is to
provide an overview of key research results and techniques used for approximate
joins. This is joint work with Nick Koudas. |
Bio: | Divesh Srivastava is the head of the Database Research Department
at AT&T Labs-Research. He received his B.Tech. in Computer Science & Engineering
from the Indian Institute of Technology, Bombay, India, and his Ph.D.
in Computer Sciences from the University of Wisconsin, Madison, USA.
His current research interests include XML databases and IP network data
management. |
10 November 2005, 11:00 PM
Title: | The Role of Document Structure in Querying, Scoring and Evaluating XML Full-Text Search |
Speaker: | Sihem Amer-Yahia, AT&T
Labs-Research |
Abstract: | A key benefit of XML is its ability to represent a mix of structured
and text data. We discuss the interplay of structured information and
keyword search in three aspects of XML search: query design, scoring
methods and query evaluation. In query design, existing languages for
XML evolved from simple keyword search to queries combining sophisticated
conditions on structure ala XPath and XQuery and complex full-text search
primitives, such as the use of ontologies and keyword proximity distance,
ala XQuery Full-Text. In XML scoring, methods range from a pure IR tf*idf
to approximating and scoring both structure and keyword conditions. In
evaluating XML search, document structure has been used to identify meaningful
XML fragments to be returned as answers to keyword queries and, is
currently being explored to optimize full-text search queries.
This discussion is based on published and ongoing work between
AT&T Labs and UBC, The U. of Toronto, Cornell U., Rutgers U.,
the U. of Waterloo and UCSD. |
Bio: | Sihem Amer-Yahia is a Senior Technical Specialist at AT&T Labs Research.
She received her Ph.D. degree from the University of Paris XI-Orsay and
INRIA. She has been working on various aspects related to XML query processing.
More lately, she has focused on XML full-text search. Sihem is a co-editor
of the XQuery Full-Text language specification and use cases published
in September 2005 by the W3C Full-Text Task Force. She is the main developer
of GalaTex, a conformance
implementation of XQuery Full-Text. |
14 November 2005, 11:00 PM
Title: | MobiEyes: Distributed Processing of Moving Queries over Moving Objects |
Speaker: | Ling Liu, Georgia Institute
of Technology |
Abstract: | With the growing popularity and availability of mobile communications,
our ability to stay connected while on the move is becoming a reality
instead of science fiction just a decade ago. An important research challenge
for modern location-based services is the scalable processing of location
monitoring requests on a large collection of mobile objects. The centralized
architecture, though studied extensively in literature, would create
intolerable performance problems as the number of mobile objects grows
significantly.
In this talk, we present a distributed architecture and a suite
of optimization techniques for scalable processing of continuously
moving location queries. Moving location queries can be viewed
as standing location tracking requests that continuously monitors
the locations of mobile objects of interests and return a subset
of mobile objects when a certain conditions are met. We describe
the design of a distributed location monitoring architecture
through MobiEyes, a distributed real time location monitoring
system in a mobile environment. The main idea behind the MobiEyes
distributed architecture is to promote a careful partition of
a real time location monitoring task into an optimal coordination
of server-side processing and client-side processing. Such a
partition allows the location of a moving object to be computed
with a high degree of precision using a small number of location
updates or no updates at all, thus providing highly scalable
and more cost-effective location monitoring services. Concretely,
the MobiEyes distributed architecture not only encourages a careful
utilization of the rapidly growing computational power available
at various mobile devices, such as cell phones, hand helds, GPS
devices, but also endorses a strong coordination agreement between
the mobile objects and the server. Such an agreement supports
varying location update rates for different mobile users at different
times, and advocates the exploitation of location predication
and location inference to further constrain the resource/bandwidth
consumption while maintaining the satisfactory precision of location
information. A set of optimization techniques are used to further
limit the amount of computations to be handled by the mobile
objects and enhance the overall performance and system utilization
of MobiEyes. Important metrics to validate the proposed architecture
and optimizations include messaging cost, server load, and amount
of computation at individual mobile objects. Our experimental
results show that the MobiEyes approach can lead to significant
savings in terms of server load and messaging cost when compared
to solutions relying on central processing of location information
at the server. If time permits, at the end of my talk, I will
also give an overview of the location privacy protection in LBS. |
Bio: | Ling Liu is currently an associate professor at the College
of Computing at Georgia Tech. She
directs the research programs in Distributed Data Intensive Systems lab,
examining research issues and technical challenges in building scalable
and secure distributed data intensive systems. Her current research interests
include performance, scalability, security and privacy issues in networked
computing systems and applications, in particular, mobile location based
services and distributed enterprise computing systems. She has published
over 150 international journal and conference articles. She has served
as a PC chair of several IEEE conferences, including the co-PC chair
of IEEE 2006 International Conference on Data Engineering (ICDE 06),
the vice chair of the Internet Computing track of the IEEE 2006 International
Conference on Distributed Computing (ICDCS 06), and is on the editorial
board of several international journals, including an associate editor
of IEEE Transactions on Knowledge and Data Engineering (TKDE), International
Journal of Very Large Databases (VLDBJ), and International Journal of
Web Service Research. Most of Dr. Liu's recent research has been sponsored
by NSF, DoE, DARPA, IBM,
and HP. |
5 December 2005, 11:00 AM
Title: | Implementing XQuery 1.0: The Story of Galax |
Speaker: | Mary Fernández,
AT&T Labs - Research |
Abstract: | XQuery 1.0 and its sister language XPath 2.0 have set a fire underneath
database vendors and researchers alike. More than thirty commercial and
research XQuery implementations are listed on the XML Query working group
home page. Galax (www.galaxquery.org) is an open-source, general-purpose
XQuery engine, designed to be complete, efficient, and extensible. During
Galax's development, we have focused on each of these three requirements
in turn, while never losing sight of the other two. In this talk, I will describe how these requirements have impacted Galax's
evolution and our own research interests. Along the way, I will show
how Galax's architecture supports these three requirements. Galax is joint work with Jérôme Siméon, IBM T.J.
Watson Research Center. |
Bio: | Mary Fernandez is Principal Technical Staff at AT&T Labs - Research.
Her research interests include data integration, Web-site implementation
and management, domain-specific languages, and their interactions. She
is a member of the W3C XML Query Language Working Group, co-editor of several
of the XQuery W3C working drafts, and is a principal designer and implementor
of Galax, a complete, open-source implementation of XQuery (www.galaxquery.org).
Mary is also an associate editor of ACM Transactions on Database Systems
and serves on the advisory board of MentorNet (www.mentornet.net), an e-mentoring
network for women in engineering and science. |
16 January 2006, 11:00 AM - MC 5136 (Please note room change.)
Title: | Discovering Interesting Subsets of Data in Cube Space |
Speaker: | Raghu Ramakrishnan, University
of Wisconsin - Madison |
Abstract: | Data Cubes have been widely studied and implemented, and so we researchers
shouldn't be thinking about them anymore, right? Wrong. In this talk,
I'll try to convince you that the multidimensional model of data ("cube" sounds
so much cooler) provides the right perspective for
addressing many challenging tasks, including dealing with imprecision,
mining for interesting subsets of data, analysis of historical stream data,
and world peace. The talk will touch upon results from a couple of VLDB
2005 papers, and some recent ongoing work. |
Bio: | Raghu Ramakrishnan is Professor of Computer Sciences at the University
of Wisconsin-Madison, and was founder and CTO of QUIQ, a company that
pioneered collaborative customer support (acquired by Kanisa). His research
is in the area of database systems, with a focus on data retrieval, analysis,
and mining. He and his group have developed scalable algorithms for clustering,
decision-tree construction, and itemset counting, and were among the
first to investigate mining of continuously evolving, stream data. His
work on query optimization and deductive databases has found its way
into several commercial database systems, and his work on extending SQL
to deal with queries over sequences has influenced the design of window
functions in SQL:1999.
He is Chair of ACM SIGMOD, on the Board of Directors of ACM SIGKDD and
the Board of Trustees of the VLDB Endowment, an associate editor of ACM
Transactions on Database Systems, and was previously editor-in-chief
of the Journal of Data Mining and Knowledge Discovery and the Database
area editor of the Journal of Logic Programming. Dr. Ramakrishnan is
a Fellow of the Association for Computing Machinery (ACM), and has received
several awards, including a Packard Foundation Fellowship, an NSF Presidential
Young Investigator Award, and an ACM SIGMOD Contributions Award. He has
authored over 100 technical papers and written the widely-used text "Database
Management Systems" (WCB/McGraw-Hill),
now in its third edition (with J. Gehrke). |
13 February 2006, 11:00 AM
Title: | Racer - Optimizing in ExpTime and Beyond: Lessons Learnt and Challenges Ahead |
Speaker: | Volker Haarslev, Concordia University |
Abstract: | In February 2004 the Web Ontology Language (OWL) was adopted by the W3C as a recommendation and emerged as a core standard for knowledge
representation in the web. The sublanguage OWL-DL is a notational
variant of the well-known description logic SHOIN(Dn-), which has
decidable inference problems but is also known to be NexpTime-complete. The availability of OWL-DL caused a significant interest in
OWL-compliant assertional description logic reasoners. Racer was the
first highly optimized assertional reasoner for the very expressive
(ExpTime-complete) description logic SHIQ(D-), which covers most
parts of OWL-DL with the exception of so-called nominals. In this talk I will briefly introduce description logics / OWL-DL and
associated inferences services. Afterward I will discuss the
architecture of the description logic reasoner Racer and highlight
selected tableau optimization techniques, especially on assertional
reasoning and its relationship to database technology. Several
recently devised optimization techniques were introduced due to
requirements from semantic web applications relating huge amounts of
(incomplete) data to ontological information. I will conclude my
presentation with an outlook on OWL 1.1 and ongoing and future
description logic research such as explanation of reasoning and
adding uncertainty as well as database support in Racer Pro. The research on Racer is joint work with Ralf Moeller, Hamburg
University of Technology. |
Bio: | Dr. Haarslev obtained his doctoral degree from the University of
Hamburg, Germany, specializing in user interface design. His early
research work was in compilers, interfaces and visual languages. His
current work is in automated reasoning, especially description
logics, which play important roles in database technology and
Internet technology. For databases, description logics allow the
integration of heterogeneous data sources. For Internet technology,
description logics are the logical foundation of the web ontology
language (OWL) and form the basis of the semantic web, the emerging
next generation of the World Wide Web. Dr. Haarslev is internationally regarded for his substantial research
contributions in the fields of visual language theory and description
logics. He is a principal architect of the description logic and OWL reasoner Racer, which can be considered as a key component for the
emerging semantic web. Dr. Haarslev holds the position of Associate
Professor in the Department of Computer Science and Software
Engineering in Concordia University. He leads a research group
working on automated reasoning and related database technology in the
context of the semantic web. Dr. Haarslev is also cofounder of the
company Racer Systems, which develops and distributes Racer Pro, the
commercial successor of Racer. |
17 April 2006, 11:00 AM
Title: | Entity Resolution in Relational Data |
Speaker: | Lise Getoor, University of Maryland |
Abstract: | A key challenge for data mining is tackling the problem of mining richly structured datasets,
where the objects are linked in some way. Links among the objects may demonstrate certain patterns,
which can be helpful for many data mining tasks and are usually hard to capture with traditional
statistical models. Recently there has been a surge of interest in this area, fueled largely by
interest in web and hypertext mining, but also by interest in mining social networks, security and
law enforcement data, bibliographic citations and epidemiological records.
In this talk, I'll begin with a short overview of this newly emerging research area. Then, I
will describe some of my group's recent work on link-based classification and entity resolution
in relational domains. I'll spend the majority of time describing our work on entity resolution.
I'll describe the framework and algorithms that we have developed, present results on several real
world datasets and our work on making the algorithms scalable. Joint work with students: Indrajit Bhattacharya, Mustafa Bilgic, Louis Licamele and Prithviraj Sen. |
Bio: | Prof. Lise Getoor is an assistant professor in the Computer Science Department at the
University of Maryland, College Park. She received her PhD from Stanford University in 2001.
Her current work includes research on link mining, statistical relational learning and
representing uncertainty in structured and semi-structured data. Her work in these areas has
been supported by NSF, NGA, KDD, ARL and DARPA. In July 2004, she co-organized the third in
a series of successful workshops on statistical relational learning,
http://www.cs.umd/srl2004. She has published numerous
articles in machine learning, data mining, database and AI forums. She is a member of AAAI
Executive council, is on the editorial board of the Machine Learning Journal and JAIR and has
served on numerous program committees including AAAI, ICML, IJCAI, KDD, SIGMOD, UAI, VLDB, and WWW. |
15 May 2006, 11:00 AM - MC 5136 (Please note room change.)
Title: | Nile: Data Streaming in Practice |
Speaker: | Walid Aref, Purdue
University |
Abstract: | Emerging data streaming applications pose new challenges to
database management systems. In this talk, I will focus on two
applications, namely mobile objects and phenomena detection and
tracking applications. I will highlight new challenges that these
applications raise and how we address them in the context of Nile,
a data stream management system being developed at Purdue. In
particular, I will present new features of Nile, including
incremental evaluation of continuous queries, supporting "predicate
windows" using views, and stream query processing with relevance
feedback. I will demonstrate the use and performance gains of
these features in the context of the above two
applications. Finally, I will talk about ongoing research in Nile
and directions for future research.
|
Bio: | Walid G. Aref is a professor of computer science at
Purdue. His research interests are in developing database
technologies for emerging applications, e.g., spatial,
spatio-temporal, multimedia, bioinformatics, and sensor
databases. He is also interested in indexing, data mining, and
geographic information systems (GIS). Professor Aref's research has
been supported by the National Science Foundation, Purdue Research
Foundation, CERIAS, Panasonic, and Microsoft Corp. In 2001, he
received the CAREER Award from the National Science Foundation and
in 2004, he received a Purdue University Faculty Scholar
award. Professor Aref is a member of Purdue's Discovery Park Bindley
Bioscience and Cyber Centers. He is on the editorial board of the
VLDB Journal, a senior member of the IEEE, and a member of the ACM.
|
5 June 2006, 11:00 AM
Title: | Data Mining using Fractals and Power Laws |
Speaker: | Christos Faloutsos,
CMU |
Abstract: | What patterns can we find in a bursty web traffic? On the web or on the internet graph itself? How about the distributions of galaxies in the sky, or the distribution of a company's customers in geographical space? How long should we expect a nearest-neighbor search to take, when there are 100 attributes per patient or customer record? The traditional assumptions (uniformity, independence, Poisson arrivals, Gaussian distributions), often fail miserably. Should we give up trying to find patterns in such settings?
Self-similarity, fractals and power laws are extremely successful in describing real datasets (coast-lines, rivers basins, stockprices, brain-surfaces, communication-line noise, to name a few). We show some old and new successes, involving modeling of graph topologies (internet, web and social networks); modeling galaxy and video data; dimensionality reduction; and more. |
Bio: | Christos Faloutsos holds a Ph.D. degree in Computer Science from the University of Toronto, Canada. He is currently a professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), seven "best paper" awards, and four teaching awards. He has published over 130 refereed articles, one monograph, and holds four patents. His research interests include data mining, fractals, indexing in multimedia and bio-informatics databases, and database performance. |
10 July 2006, 11:00 AM
Title: | Dynamic Programming for Join Ordering Revisited |
Speaker: | Guido Moerkotte, University of Mannheim |
Abstract: | Two approaches to derive dynamic programming algorithms for constructing join trees are described in the literature. We show analytically and experimentally that these two variants exhibit vastly diverging runtime behaviors for different query graphs. More specifically, each variant is superior to the other for one kind of query graph (chain or clique), but fails for the other. Moreover, neither of them handles star queries well. This motivates us to derive an algorithm that is superior to the two existing algorithms because it adapts to the search space implied by the query graph. |
Bio: | From 1981 to 1987 Guido Moerkotte studied computer science at the Universities of Dortmund, Massachusetts, and Karlsruhe. The University of Karlsruhe awarded him a Diploma (1987), a doctorate (1989), and a postdoctoral lecture qualification (1994). In 1994 he became an associate professor at the RWTH Aachen. Since 1996 he holds a full professor position at the University of Mannheim where he heads the database research group. His research interests include databases and their applications, query optimization, and XML databases. Guido Moerkotte (co-) authored more than 100 publications and three books. |
26 July 2006, 11:00 AM
Title: | A System for Data, Uncertainty, and Lineage |
Speaker: | Jennifer Widom, Stanford University |
Abstract: | Trio is a new type of database system that manages uncertainty and lineage of data as first-class concepts, along with the data itself. Uncertainty and lineage arise in a variety of data-intensive applications, including scientific and sensor data management, data cleaning and integration, and information extraction systems. This talk will survey our recent and current work in the Trio project: the extended-relational "ULDB" model upon which the Trio system is based, Trio's SQL-based query language (TriQL) including formal and operational semantics, a selection of new theoretical challenges and results, Trio's initial prototype implementation, and our planned research directions.
Trio web site: http://www-db.stanford.edu/trio/ |
Bio: | Jennifer Widom is a Professor in the Computer Science and Electrical Engineering Departments at Stanford University. She received her Bachelors degree from the Indiana University School of Music in 1982 and her Computer Science Ph.D. from Cornell University in 1987. She was a Research Staff Member at the IBM Almaden Research Center before joining the Stanford faculty in 1993. Her research interests span many aspects of nontraditional data management. She is an ACM Fellow and a member of the National Academy of Engineering, was a Guggenheim Fellow, and has served on a variety of program committees, advisory boards, and editorial boards. |