# Database Research Group Events

## Fall 2000

 DB meeting: Friday, September 8 , 2:00 pm, DC1331 Speaker: N/A Topic: 1. General introductions around the table, including brief descriptions of research interests. 2. Lab update. 3. Speaker rotation.

 DB meeting: Friday, September 15 , 2:00 pm, Niagara Room, Sybase iAnywhere Solutions, 415 Phillip St. Speaker: Anil Goel Topic: I'll talk about the issue of creating and maintaining low cost histograms and describe the algorithms presented in the following paper: Self-tuning Histograms: Building Histograms Without Looking at Data, Ashraf Aboulnaga and Surajit Chaudhuri, SIGMOD 99 Note: This talk will take place at Sybase and will be followed by a brief tour for those who are interested. Sybase is located at 415 Phillip Street, just north of the corner of Phillip and Columbia, and a short walk from the Davis Centre. Enter through the front door, and sign in with the receptionist. A group will leave the Davis Center on foot for Sybase at 1:45pm sharp. We'll depart from the Davis Center north door, near ICR. Join us if you'd like some company for the walk.

 DB meeting: Friday, September 22, 2:00pm, DC1331 Speaker: Heng Yu Topic: A look at What's not in a name: Some Properties of a Purely Structural Approach to Integrating Description Logic Terminologies, by Alex Borgida and Ralf Kuesters, presented at 2000 International Workshop on Description Logics. One approach to integrating knowledge bases is to based on finding assertions that relate the expressions in the constituent terminologies. For knowledge bases with many terms this task requires computer support. The authors set up formal framework for merging Description Logic TBoxs, and then explore the limits of a purely structural approach to the problem of finding inter-relationshops between knowledge bases. Some theoretical notions are empirically examined in a real medical ontology (GALEN). References: (For Description Logic Background:) Description Logics in Data Management, A. Borgida. IEEE TKDE 1995 (For Schema Integration Background:) A Comparative Analysis of Methodologies for Database Schema Integration, C. Batini, M. Lenzerini, S. Navathe, ACM Computing Surveys, 18(4), 1986 (For More Detailed Work of the Authors:) "What's not in a name?" Initial Explorations of a Structural Approach to Integrating Large Concept Knowledge-Bases, A. Borgida, R. Kuesters, Technical Report DCS-TR-391, CS Rutgers Univ. 1999 Snacks: Anil Goel

 DB meeting: Friday, September 29, 2:00 pm, DC1304 (Not the usual room!) Speaker: Vlado Keselj Topic: A Unification-based Approach to Question Answering The problem of Question Answering (QA) as used in the TREC can be formulated as follows:Given a collection of natural-language (NL) documents find an answer to given NL query that is a short substring of one of the documents and found in a relevant context. I will: discuss the problem in the context of the problems of IR, IE, and the classical QA, and why it is important, give a description of the TREC QA task, present some related systems: AskJeeves.com, FAQ Finder, predictive annotation, and ExtrAns, and present a novel approach based on the stochastic unification-based grammars. Snacks: Heng Yu

 DB meeting: Friday, October 6, 2:00 pm, DC1331 Speaker: Gord Cormack Topic: Question Answering at TREC 9 For TREC 9, we completed some of the experiments that we didn't get done in time for TREC 8. We parsed the questions, and used the parse to enhance the set of search terms and to locate the answer within candidate passages. I will present some results and ideas for further development. Snacks: Vlado Keselj

 DB meeting: Friday, October 13th , 2:00 pm, DC1331 Speaker: Bill O'Connell, IBM Toronto Topic: DB2 Universal Database in the Astronomical Expansion of the e-business Universe This talk will explain how DB2 is evolving to meet the extensive demands of e-business. Starting with an assessment of the role databases play in the application development process in the world of e-business, I will show you how users leverage the core e-business standards optimized in DB2 application development. Next, Ill address the expansion of the new functions in DB2 V7 for UNIX, Windows and OS/2 that address business intelligence environments in the e-business world. This includes features to support rolling OLAP functions, enhancement to OLAP cube support, advanced statistical analysis, complex query cache management, and how DB2 helps OLAP and mining tools, and ERP applications. The session will also include an analysis of XML in the handling of business objects and business-to-business and a demonstration of DB2s capabilities in this domain as it relates to business intelligence.

 SPECIAL: Friday, October 20, 2:30-4:00 PM, Humanities Theatre Speaker: Donald E. Knuth, Professor Emeritus, Stanford Univ. Title: The Joy of Asymptotics

 DB seminar: Monday, October 23, 11:00 AM, MC 5158 Speaker: Vincent Oria, New Jersey Institute of Technology Title: Courseware-On-Demand: Generating New Course Material From Existing Courses

 DB meeting: Friday, October 27th , 2:00 pm, DC1331 Speaker: Airi Salminen Topic: Grammars++ revisited I will briefly describe the Grammars++ model for text, jointly developed with Frank Tompa. The model is based on the notion of a constraining grammar. Given a context-free grammar G as a base grammar, a constraining grammar is derived from productions of G by attaching predicates to selected non-terminals. Sequences of constraining grammars are then used as filters to specify a subset of information in structured text. I will also give a quick overview of some successful applications of the model for data retrieval, transformation, and hypertext creation. I will then discuss the potential for applying the model to XML data and introduce some interesting reseach problems. Reference: A.Salminen and F.W.Tompa, "Grammars++ for Modelling Information in Text," Information Systems, Vol. 24, No. 1 (1999) 1-24. Snacks: Gord Cormack

 DB meeting: Friday, November 3rd, 2:00pm, DC1331 Speaker: Gary Promhouse, Open Text Topic: Vertical Relational Database Representations We start with a brief review of the literature on vertical database representations. We then describe a representation that achieves significant compaction, while simultaneously supporting the basic operations of selection and join without auxiliary index structures. Experiments on some real life tables have realized several orders of magnitude reduction in representation size, while simultaneosly greatly reducing the computational cost of these fundamental operations.

 DB seminar: Monday, November 6, 11:00 AM, DC 1304 Speaker: Hector Garcia-Molina, Stanford University Title: How to Crawl the Web

 DB meeting: Friday, November 10th, 2:00 pm, DC1331 Speaker: Hui Zhang Topic: I will present the paper titled "Quilt: An XML Query Language for Heterogeneous Data Sources" and review Quilt solutions to some use cases posed in Appendix B of the "XML Query Requirements" document. Here are some references: Snacks: Airi Salminen

 DB meeting: Friday, November 17th , 2:00 pm, DC1331 Speaker: Yasser Ebrahim Topic: I intend to present a paper titled "Why a picture is sometimes worth a 1000 words". This paper compares textual and diagrammatic representations and tries to explain why the latter could be superior to the former. Reference: Reference: Larkin, J. and Simon, H., Why a Diagram Is (Sometimes) Worth Ten Thousand Words.; Diagrammatic Reasoning, AAAI/The MIT Press, Menlo Park, California, pp. 69-109, 1995 Snacks: Hui Zhang

 DB seminar: Monday, November 20, 11:00 AM, DC 1304 Speaker: Avigdor Gal, Rutgers University Title: An Authorization Model for Temporal Data

 DB meeting: Friday, November 24th , 2:00 pm, DC1331 Speaker: Jianchao Han Topic: Interactive Construction of Classifiers This talk will describe two systems that I implemented: CVizT -- interactive construction of classification rules based on Table Lens visualization technique, and DTViz -- interactive construction of decision trees based on Parallel Segments visualization technique. Snacks: Yasser Ebrahim

 DB meeting: Friday, December 1st , 2:00 pm, DC1331 Speaker: Ian Davis Topic: Indexing XML Since 1997 I have been developing a SQL2 database search engine, which facilitated efficient access to fragments of structured text. As part of this software development a subordinate text engine was developed. The objective of a text engine is to provide rapid retrieval to useful textual information, to be efficient in both space and time, and to provide sufficient flexibility to allow for easy growth in as yet undefined directions, as the needs of the XML community become clearer. A further as yet unrealised goal is to allow rapid update of text indices as minor changes are made to the text being indexed. In this talk I will explore the approach I use to index structured text, the methods used to compress these indices, the data structures produced by the indexing process, and the use made of these data structures in resolving queries made against instances of text. Snacks: Jianchao Han

 DB seminar: Monday, December 4, 11:00 AM, DC 1304 Speaker: Jan Chomicki, SUNY at Buffalo Title: Consistent Query Answers in Inconsistent Databases

 Seminar: Thursday, December 7th , 10:30 am, DC1304 Speaker: Laks V. S. Lakshmanan, Concordia University Topic: Constraints and Structures in Data Mining  Spurred by the potential of discovering interesting and useful knowledge, substantial research has been done on data mining. Most previous work essentially falls into one of two "generations". In the first generation, the focus was on identifying which patterns are interesting and significant, and devising fast algorithms for mining them from large data sets. In the second generation, the importance of integrating data mining with the other key components of the knowledge discovery (KDD) process has been recognized. One such component is the underlying DBMS and strategies and architectures for integrating association rule discovery with the DBMS have been studied.  In this talk, I will focus on the other, equally (if not more) important component of KDD -- the human user. Many mining algorithms devised in the first generation implicitly assume data mining is a one-shot exercise, as opposed to the iterative exploratory process it really is. This is a significant shortcoming since mining algorithms tend to be computationally intensive. Thus, the concerns in integrating the user in the loop are: (i) how can the user control the nature of the mining computation undertaken by the system at any point? (ii) how can the user enforce focus on mining based on his knowledge of application semantics? (iii) how can the user migrate from specific mining tasks performed, to issuing ad hoc mining queries? I will discuss how constraints can play a significant role in addressing these concerns, specifically in the domains of frequent sets and clustering. Many a time, finding {\em when} a pattern holds in a data set can be as interesting as the pattern itself: e.g., when is an itemset frequently purchased? I will briefly discuss algorithms and structures that facilitate this kind of dual mining.  Finally, the data mining process tends to involve multiple mining tasks, such as classification, frequent sets, data cube, etc. I will briefly discuss our recent work on a model and an algebra for data mining that neatly integrates many mining tasks into one framework so the input of one mining task can be the output of another.

 DB meeting: Friday, December 8th , 2:00 pm, DC1304 (Not the usual room!) Speaker: Matthew Young-Lai Topic: Longest Regular Expression Matching In general, it is inefficient to search for longest matches for a regular expression using a single pass. I will describe two ways of dealing with this. Snacks: Ian Davis