[Please remove <h1>]
Fall 2000
DB
meeting: |
Friday, September 8 , 2:00 pm, DC1331 |
Speaker: |
N/A |
Topic: |
1. General introductions around the table, including brief descriptions
of research interests.
2. Lab update.
3. Speaker rotation. |
DB
meeting: |
Friday, September 15 , 2:00 pm, Niagara
Room, Sybase iAnywhere Solutions, 415 Phillip St. |
Speaker: |
Anil Goel |
Topic: |
I'll talk about the issue of creating and maintaining low cost histograms
and describe the algorithms presented in the following paper:
Self-tuning Histograms: Building Histograms Without Looking at Data,
Ashraf Aboulnaga and Surajit Chaudhuri, SIGMOD 99 |
Note: |
This talk will take place at Sybase and will
be followed by a brief tour for those who are interested. Sybase is located
at 415 Phillip Street, just north of the corner of Phillip and Columbia,
and a short walk from the Davis Centre. Enter through the front door, and
sign in with the receptionist.
A group will leave the Davis Center on foot for
Sybase at 1:45pm sharp. We'll depart from the Davis Center north door,
near ICR. Join us if you'd like some company for the walk. |
DB
meeting: |
Friday, September 22, 2:00pm, DC1331 |
Speaker: |
Heng Yu |
Topic: |
A look at
What's
not in a name: Some Properties of a Purely Structural Approach to Integrating
Description Logic Terminologies, by Alex Borgida and Ralf Kuesters,
presented at 2000 International Workshop on Description Logics.
One approach to integrating knowledge bases is to based on finding assertions
that relate the expressions in the constituent terminologies. For knowledge
bases with many terms this task requires computer support. The authors
set up formal framework for merging Description Logic TBoxs, and then explore
the limits of a purely structural approach to the problem of finding inter-relationshops
between knowledge bases. Some theoretical notions are empirically examined
in a real medical ontology (GALEN). |
References: |
(For Description Logic Background:)
Description Logics in Data Management, A. Borgida. IEEE TKDE
1995
(For Schema Integration Background:)
A Comparative Analysis of Methodologies for Database Schema Integration,
C. Batini, M. Lenzerini, S. Navathe, ACM Computing Surveys, 18(4), 1986
(For More Detailed Work of the Authors:)
"What's not in a name?" Initial Explorations of a Structural Approach
to Integrating Large Concept Knowledge-Bases, A. Borgida, R. Kuesters,
Technical Report DCS-TR-391, CS Rutgers Univ. 1999 |
Snacks: |
Anil Goel |
DB
meeting: |
Friday, September 29, 2:00 pm,
DC1304
(Not the usual room!) |
Speaker: |
Vlado Keselj |
Topic: |
A Unification-based Approach to Question Answering
The problem of Question Answering (QA) as used in the TREC can be formulated
as follows:Given a collection of natural-language (NL) documents
find an answer to given NL query that is a short substring of one of the
documents and found in a relevant context.
I will:
-
discuss the problem in the context of the problems of IR, IE, and the classical
QA, and why it is important,
-
give a description of the TREC QA task,
-
present some related systems: AskJeeves.com, FAQ Finder, predictive annotation,
and ExtrAns, and
-
present a novel approach based on the stochastic unification-based grammars.
|
Snacks: |
Heng Yu |
DB
meeting: |
Friday, October 6, 2:00 pm, DC1331 |
Speaker: |
Gord Cormack |
Topic: |
Question Answering at TREC 9
For TREC 9, we completed some of the experiments that we didn't get
done in time for TREC 8. We parsed the questions, and used the parse to
enhance the set of search terms and to locate the answer within candidate
passages. I will present some results and ideas for further development. |
Snacks: |
Vlado Keselj |
DB
meeting: |
Friday, October 13th , 2:00 pm, DC1331 |
Speaker: |
Bill O'Connell, IBM Toronto |
Topic: |
DB2 Universal Database in the Astronomical Expansion of the e-business
Universe
This talk will explain how DB2 is evolving to meet the extensive demands
of e-business. Starting with an assessment of the role databases play in
the application development process in the world of e-business, I will
show you how users leverage the core e-business standards optimized in
DB2 application development. Next, I`ll address the expansion of the new
functions in DB2 V7 for UNIX, Windows and OS/2 that address business intelligence
environments in the e-business world. This includes features to support
rolling OLAP functions, enhancement to OLAP cube support, advanced statistical
analysis, complex query cache management, and how DB2 helps OLAP and mining
tools, and ERP applications. The session will also include an analysis
of XML in the handling of business objects and business-to-business and
a demonstration of DB2`s capabilities in this domain as it relates to business
intelligence. |
SPECIAL: |
Friday, October 20, 2:30-4:00 PM, Humanities
Theatre |
Speaker: |
Donald E. Knuth, Professor Emeritus, Stanford Univ. |
Title: |
The Joy of Asymptotics |
DB
meeting: |
Friday, October 27th , 2:00 pm, DC1331 |
Speaker: |
Airi Salminen |
Topic: |
Grammars++ revisited
I will briefly describe the Grammars++ model for text, jointly developed
with Frank Tompa. The model is based on the notion of a constraining grammar.
Given a context-free grammar G as a base grammar, a constraining grammar
is derived from productions of G by attaching predicates to selected non-terminals.
Sequences of constraining grammars are then used as filters to specify
a subset of information in structured text.
I will also give a quick overview of some successful applications of
the model for data retrieval, transformation, and hypertext creation. I
will then discuss the potential for applying the model to XML data and
introduce some interesting reseach problems.
Reference: A.Salminen and F.W.Tompa, "Grammars++ for Modelling Information
in Text," Information Systems, Vol. 24, No. 1 (1999) 1-24. |
Snacks: |
Gord Cormack |
DB
meeting: |
Friday, November 3rd, 2:00pm, DC1331 |
Speaker: |
Gary Promhouse, Open Text |
Topic: |
Vertical Relational Database Representations
We start with a brief review of the literature on vertical database
representations. We then describe a representation that achieves significant
compaction, while simultaneously supporting the basic operations of selection
and join without auxiliary index structures. Experiments on some real life
tables have realized several orders of magnitude reduction in representation
size, while simultaneosly greatly reducing the computational cost of these
fundamental operations. |
DB
seminar: |
Monday, November 6, 11:00 AM, DC 1304 |
Speaker: |
Hector Garcia-Molina, Stanford University |
Title: |
How
to Crawl the Web |
DB
meeting: |
Friday, November 10th, 2:00 pm, DC1331 |
Speaker: |
Hui Zhang |
Topic: |
I will present the paper titled "Quilt: An XML Query Language for Heterogeneous
Data Sources" and review Quilt solutions to some use cases posed in Appendix
B of the "XML Query Requirements" document.
Here are some references:
-
Quilt
paper
-
XML Query
Requirements
-
Quilt solutions:
|
Snacks: |
Airi Salminen |
DB
meeting: |
Friday, November 17th , 2:00 pm, DC1331 |
Speaker: |
Yasser Ebrahim |
Topic: |
I intend to present a paper titled "Why a picture is sometimes worth
a 1000 words". This paper compares textual and diagrammatic representations
and tries to explain why the latter could be superior to the former. |
Reference: |
Reference: Larkin, J. and Simon, H., Why a Diagram Is (Sometimes) Worth
Ten Thousand Words.; Diagrammatic Reasoning, AAAI/The MIT Press, Menlo
Park, California, pp. 69-109, 1995 |
Snacks: |
Hui Zhang |
DB
meeting: |
Friday, November 24th , 2:00 pm, DC1331 |
Speaker: |
Jianchao Han |
Topic: |
Interactive Construction of Classifiers
This talk will describe two systems that I implemented: CVizT -- interactive
construction of classification rules based on Table Lens visualization
technique, and DTViz -- interactive construction of decision trees based
on Parallel Segments visualization technique. |
Snacks: |
Yasser Ebrahim |
DB
meeting: |
Friday, December 1st , 2:00 pm, DC1331 |
Speaker: |
Ian Davis |
Topic: |
Indexing XML
Since 1997 I have been developing a SQL2 database search engine, which
facilitated efficient access to fragments of structured text. As part of
this software development a subordinate text engine was developed.
The objective of a text engine is to provide rapid retrieval to useful
textual information, to be efficient in both space and time, and to provide
sufficient flexibility to allow for easy growth in as yet undefined directions,
as the needs of the XML community become clearer. A further as yet unrealised
goal is to allow rapid update of text indices as minor changes are made
to the text being indexed.
In this talk I will explore the approach I use to index structured
text, the methods used to compress these indices, the data structures produced
by the indexing process, and the use made of these data structures in resolving
queries made against instances of text. |
Snacks: |
Jianchao Han |
Seminar: |
Thursday, December 7th , 10:30 am, DC1304 |
Speaker: |
Laks V. S. Lakshmanan, Concordia University |
Topic: |
Constraints and Structures in Data Mining
Spurred by the potential of discovering interesting and useful knowledge,
substantial research has been done on data mining. Most previous work essentially
falls into one of two "generations". In the first generation, the focus
was on identifying which patterns are interesting and significant, and
devising fast algorithms for mining them from large data sets. In the second
generation, the importance of integrating data mining with the other key
components of the knowledge discovery (KDD) process has been recognized.
One such component is the underlying DBMS and strategies and architectures
for integrating association rule discovery with the DBMS have been studied.
In this talk, I will focus on the other, equally (if not more) important
component of KDD -- the human user. Many mining algorithms devised in the
first generation implicitly assume data mining is a one-shot exercise,
as opposed to the iterative exploratory process it really is. This is a
significant shortcoming since mining algorithms tend to be computationally
intensive. Thus, the concerns in integrating the user in the loop are:
(i) how can the user control the nature of the mining computation undertaken
by the system at any point? (ii) how can the user enforce focus on mining
based on his knowledge of application semantics? (iii) how can the user
migrate from specific mining tasks performed, to issuing ad hoc mining
queries? I will discuss how constraints can play a significant role in
addressing these concerns, specifically in the domains of frequent sets
and clustering. Many a time, finding {\em when} a pattern holds in a data
set can be as interesting as the pattern itself: e.g., when is an itemset
frequently purchased? I will briefly discuss algorithms and structures
that facilitate this kind of dual mining.
Finally, the data mining process tends to involve multiple mining tasks,
such as classification, frequent sets, data cube, etc. I will briefly discuss
our recent work on a model and an algebra for data mining that neatly integrates
many mining tasks into one framework so the input of one mining task can
be the output of another. |
DB
meeting: |
Friday, December 8th , 2:00 pm, DC1304
(Not the usual room!) |
Speaker: |
Matthew Young-Lai |
Topic: |
Longest Regular Expression Matching
In general, it is inefficient to search for longest matches for a regular
expression using a single pass. I will describe two ways of dealing with
this. |
Snacks: |
Ian Davis |
This page is maintained by
Frank
Tompa and
Ken Salem.