[Please remove <h1>]
The Database Seminar Series provides a forum for presentation and discussion
of interesting and current database issues. It complements our internal database
meetings by bringing in external colleagues. The talks that are scheduled for
2007-2008 are below.
Unless otherwise noted, all talks will be in room DC 1304. Coffee will be
served 30 minutes before the talk.
We will try to post the presentation notes, whenever that is possible. Please
click on the presentation title to access these notes (usually in pdf format).
Database Seminar Series is supported by iAnywhere Solutions, A Sybase Company.
10 September 2007, 10:30
AM
Title: | The CarTel Mobile Sensor System |
Speaker: | Samuel Madden, MIT |
Abstract: | CarTel is a software and hardware platform for collecting and processing
data from moving vehicles. Data is captured from GPS, Wi-Fi, and OBD-II
interfaces and stored in a local database on each car. This stored data is
transferred opportunistically, via available Wi-Fi networks, cellular
modems, or by "muling" data on a user's cell phone or USB key, to a central "portal", where users can browse and visualize it. To allow non-expert
users to specify what data they would like to collect from remote vehicles,
CarTel includes a simple database-like interface for programming and configuration.
In this talk, I will discuss the architecture of CarTel data management
system and mention some applications of the technology to traffic and fleet
management, driving safety, and fuel conservation. I will also show a
brief demo of the portal software, which provides a map-based interface for
browsing about 1100 hours of driving data from collected by our system from
six cars over the past year. |
Bio: | Sam Madden is an associate professor in the EECS department at MIT and a
member of MIT's Computer Science and Artificial Intelligence Laboratory
(CSAIL). His research interests span all areas of database systems; past
projects include the TinyDB system for data collection from sensor networks
and the Telegraph adaptive query processing engine. His current research
focuses on modeling and statistical techniques for value prediction and
outlier detection in sensor networks, high performance database systems, and
networking and data processing in disconnected environments. He received
his Ph.D. from UC Berkeley in 2004. |
15 October 2007, 10:30 AM
Title: | Sorting and indexing with partitioned B-trees |
Speaker: | Goetz Graefe, HP Labs |
Abstract: | Even after decades of research and industrial development in B-tree indexes and sorting algorithms, some fairly simple yet effective improvements remain possible. This paper describes one of those, partitioning within a B-tree based on an artificial leading key column. It has significant impact on multiple important database operations. First, during external merge sort, partitioned B-trees enable accurate deep read-ahead and dynamic allocation of memory and threads. Second, during index creation, partitioned B-trees reduce the delay until the first query can search the new index. Third, during data loading, partitioned B-trees permit very fast integration of new data into a fully indexed database. Finally, partitioned B-trees enable or speed up several other operations, including indexing of real-time data streams. By exploiting the duality of B-trees and external merge sort, partitioned B-trees simplify and unify data structures and algorithms, yet offer these benefits using only techniques already supported in modern relational database management systems.
|
Bio: | Goetz Graefe recently joined HP Labs to pursue research into database management in support of HP's renewed interest in database management software and hardware. He spent 12 years in product development at Microsoft after several years in academic research and teaching. His publications can be found in ACM Computing Surveys as well as leading journals and conferences in the database field. They have been honored by an ACM SIGMOD "Test of time" award as well as the inaugural IEEE Data Engineering "Influential paper" award. His research has been adopted by Sequent (now IBM), Informix (now IBM), Texas Instruments, Tandem (now HP), Microsoft, and others. |
19 November 2007, 10:30 AM
Title: | File Systems Are Broken (And What We're Doing To Fix Them) |
Speaker: | Remzi Arpaci-Dusseau, University of Wisconsin |
Abstract: | In this talk, I will present a summary of our recent research
in understanding how disks fail and how file and storage systems
handle such failures. Our findings reveal numerous design
and implementation problems in a wide range of both open-source
and commercial systems; put simply, file systems are broken
(at least when it comes to reliability).
I will then present a number of current research directions that we
are pursuing in order to build a new generation of robust and
reliable storage systems. With more formal analysis and construction
techniques, we hope to transform the "art" of file system design and
implementation into a more rigorous and careful science.
|
Bio: | Remzi Arpaci-Dusseau is an associate professor of Computer
Sciences at the University of Wisconsin, Madison. He received
his undergraduate degree from the University of Michigan and
Masters and Ph.D. from the University of California, Berkeley.
Remzi and his wife and colleague Andrea Arpaci-Dusseau
have spent the past 10 years working on I/O and storage
issues, with a recent focus on reliability. Details of their
work can be found at: http://www.cs.wisc.edu/adsl.
|
10 December 2007, 10:30 AM
Title: | Supporting Scalable Online Statistical Processing |
Speaker: | Christopher Jermaine, University of Florida |
Abstract: | Data warehousing and analytic processing have been active areas of
database research and development for nearly two decades. Many experts
now consider these topics to be "solved", especially with regard to performance. However, an argument can be made that users and databases
have simply reached an uneasy truce with regard to analytic
processing. If users avoid ad-hoc, exploratory queries that might take
days to execute, then of course the database performs just fine.
In this talk, I will describe query processing in a database system
called DBO that is designed from the ground up to support interactive
analytic processing. DBO can run database queries from start to finish
and produce exact answers in a scalable fashion. However, unlike any
existing research or production system, DBO is able to produce
statistically meaningful approximate answers at all times throughout
query execution. These answers are continuously updated from start to
finish, even for "huge" queries requiring arbitrary quantities of
temporary secondary storage. Thus, a user can stop execution whenever satisfied with the query accuracy, which may translate to dramatic
time savings during exploratory processing. |
Bio: | Chris Jermaine is an assistant professor of computer science at the
University of Florida. His research is generally concerned with
databases and data analysis. With his students and colleagues at UF he
has published papers in forums such as ACM TODS, VLDB, SIGMOD, and
ICDE. He is the recipient of an NSF CAREER award for his research on
approximate query processing. |
14 January 2008, 10:30 AM
Title: | Cyber Communities: Enabling Innovation in Science and Engineering |
Speaker: | Ahmed Elmagarmid, Purdue University |
Abstract: | Cyber Infrastructure is the chief enabler for innovation in many of today's most pressing Science and Engineering problems. There are in fact complete areas in Science such as Systems Biology that would not exist if not for a flourishing computing and information discipline. In this presentation we will introduce cyber infrastructure (CI), its chief components, and the transition from being an enhancer for discovery to an enabler. We will then introduce Cyber Communities as a viable way to build new CI for domain sciences. We will show through example and demonstration few existing projects in the Cyber Center at Purdue where high performance computing, hub technologies, data analytics and visualization all come together to solve specific problems. |
Bio: | Professor Elmagarmid is the director of the Indiana Center for
Database
Systems and the Cyber Center in Discovery Park. He received a
Presidential
Young Investigator award from the National Science Foundation, and
distinguished alumni awards from Ohio State University and the
University of
Dayton in 1993 and 1995, respectively. His research work has
focused on
database systems. He has done more than his share in serving the
database
community through Conferences, Journals and the professional
societies.
Professor Elmagarmid was a chief scientist in the Office of Strategy
and
Technology at Hewlett-Packard (HP). While at HP, he was responsible
for
software strategy coming out of the corporate CTO office and
contributed to
cross company roadmap initiatives. He also served on the technology
council
for HP. Professor Elmagarmid serves as a consultant in the areas of
computing technology, the role of technology in societal development
and
strategic planning. He has several international clients in Qatar
and Italy.
Within North America, he has worked for Telcordia Technology,
Bellcore, IBM,
CSC, Harris, D. H. Brown and Associates, MCC, Bell Northern Research,
Molecular Design Labs, King & Spalding LLP, and UniSql to name a few. |
11 February 2008, 2:30 PM (Please
note change of time)
Title: | Depth Estimation for Ranking Query Optimization |
Speaker: | Neoklis Polyzotis, University of California - Santa Cruz |
Abstract: |
A relational ranking query uses a scoring function to limit the
results of a conventional query to a small number of the most
relevant answers. The increasing popularity of this query paradigm
has led to the introduction of specialized rank join operators that
integrate the selection of top tuples with join processing. The
number of input tuples that an operator accesses is called the input
depth of the operator, and this is the driving cost factor in rank
join processing. This introduces the problem of depth estimation,
which is crucial for the costing of rank join operators during query
compilation and thus for their integration in optimized physical plans.
This talk will provide an overview of our recent work on DEEP, an
estimation methodology for approximating the depths of rank join
operators in a physical plan. DEEP relies on a general, principled
framework in order to formalize depth computation in terms of the
joint distribution of scores in the base tables. This results in a
systematic estimation methodology that takes the characteristics of
the data directly into account and thus enables more accurate
estimates. The talk reviews the main technical points of our work and
presents experimental results that validate the benefits of the
proposed technique over existing methods.
|
Bio: |
Alkis Polyzotis is currently an assistant professor at UC Santa Cruz.
His research focuses on database systems, and in particular on
approximate query answering, on-line database tuning, and P2P
databases. He is the recipient of an NSF CAREER award in 2004 and of
an IBM Faculty Award in 2005 and 2006. He received his PhD from the
University of Wisconsin at Madison in 2003.
|
31 March 2008, 10:30 AM
Title: | Towards Schema-Free Exchange and Update of XML Data |
Speaker: | Denilson Barbosa, University of Calgary |
Abstract: | We consider the problem of updating XML data in the context of casual, non-expert users exchanging data (e.g., through Web data sharing services) with limited or no schema-knowledge. We introduce a novel paradigm for updating XML data based on simple yet powerful update operations. In particular, we propose effective methods for translating data from one representation into another and also for determining the appropriate locations for effecting the updates without violating the schemas for the data sources. For concreteness, we discuss an intuitive update language that frees the user from specific schema knowledge and can be implemented with our framework. Moreover, our proposal is much simpler than current XML update languages, and, as such, it is appropriate for non-experts users. We discuss alternative semantics for the update operations as well as efficient algorithms for implementing them. We present an experimental analysis with real XML data from several domains, showing that our method is efficient, and highly effective and accurate. |
Bio: | Denilson Barbosa is an Assistant Professor at the University of Calgary since 2005, where he joined after receiving a PhD from the University of Toronto. His research interests are in databases and the Web, and he has worked on the storage, update, and consistency checking of XML data, as well as in mappings between disparate data formats. He also worked on autonomic data management tools. Denilson is the recipient of an IBM Faculty Award, and the Alberta Ingenuity Fund New Faculty Award. |
2 June 2008, 10:30 AM
Title: | New Challenges in the Management of Data |
Speaker: | Amr El Abbadi, University of California, Santa Barbara |
Abstract: |
Data management has evolved significantly over the last decade. A traditional database managed well defined, structured data, answered precise queries, and stored the data in trustworthy settings. Recent applications have expanded the scope of the data managed to include unstructured and semi-structured data as well as non-persistent data such as data streams. The types of queries expected are diverse, imprecise and including on-line analytical processing and ranking queries. Even the data is often stored in non-trusted third parties and yet these third parties are expected to manage the data and answer queries. Many of these innovative and challenging settings form the basis of the research projects we are exploring at UCSB. In this talk I will discuss recent results related to the execution of privacy preserving operations, detecting fraud in advertisement click streams, and using hardware to overcome performance limitations of software solutions.
|
Bio: |
Amr El Abbadi received his Ph.D. in Computer Science from Cornell University. In August 1987 he joined the Department of Computer Science at the University of California, Santa Barbara, where he is currently a Professor and Chair of the department. He has been a visiting professor/scientist at the University of Campinas in Brazil, IBM Almaden Research Center, the Swedish Institute of Computer Science in Stockholm, Sweden and in IRISA at the University of Rennes in France. In 2002-03, he was director of the University of California Education Abroad Program in Cairo, Egypt. He has served as area editor for Information Systems: An International Journal and as an editor of Information Processing Letters (IPL). He served on many database and distributed systems program committees. Currently, he is a board member of the VLDB Endowment. In 2007, he received the UC Santa Barbara Senate Graduate Mentor Award. Prof. El s main research interests and accomplishments have been in understanding and developing basic mechanisms for supporting distributed information management systems, including databases, digital libraries, peer-to-peer systems, and spatial databases.
|
16 June 2008, 10:30 AM
Title: | The Demaq System: Declarative Messaging and Queueing |
Speaker: | Carl-Christian Kanne, University Mannheim |
Abstract: | The goal of the Demaq project is to investigate a novel way of thinking about distributed applications that are based on the asynchronous exchange of XML messages. Unlike today's solutions that rely on imperative programming languages and multi-tiered application servers, Demaq uses a declarative language for implementing the application logic as a set of rules. A rule compiler transforms the application specifications into execution plans against the message history, which are evaluated using our optimized runtime engine. This allows us to leverage existing knowledge about declarative query processing for optimizing distributed applications. |
Bio: | Carl-Christian Kanne is an assistant professor of Information Systems at the University of Mannheim (since 2006), currently on leave as the interim head of the database group at the University of Zurich. His research focuses on data management in a variety of domains, including life sciences and support for distributed business processes. He obtained his Master's (Diplom) and PhD degrees from RWTH Aachen University and University of Mannheim, respectively. |
23 June 2008, 10:30 AM
Title: | OLAP on Sequence Data |
Speaker: | Eric Lo, Hong Kong Polytechnic University |
Abstract: | Many kinds of real-life data exhibit logical ordering among their data items and are thus sequential in nature. However, traditional
online analytical processing (OLAP) systems and techniques were
not designed for sequence data and they are incapable of supporting
sequence data analysis. In this talk, I will present the concept of Sequence OLAP, or S-OLAP for short.
The biggest distinction of S-OLAP from traditional OLAP is that a sequence can
be characterized not only by the attributes’ values of its constituting items,
but also by the subsequence/substring patterns it possesses. This talk will
cover the concepts of sequence cuboid and sequence data cube as well as the
implementation issues. |
Bio: | Eric Lo is an assistant professor at the Hong Kong Polytechnic University (HKPU)
since 2007. He received his Bachelor degree from HKPU and Masters degree from
the University of Hong Kong (HKU). He received his PhD degree from ETH Zurich
in 2007. His research interests include data engineering, software engineering,
and the fusion of both. |