Database Research Group Events

Spring 1999

 
Time: Research group meeting, Friday May 7,  2:00 pm, DC1331
Speaker: Khuzaima Daudjee
Snacks: Ian Davis
Topic: I'll talk about the Mariposa distributed database system this Friday. Mariposa was designed and developed at Berkeley by Stonebraker, Aoki, Sah, Staelin, Sidell and a few others.

Time: Research group meeting, Friday May 14, 2:00 pm, DC1331
Speaker: Reem Al-Halimi
Snacks: Khuzaima Daudjee
Topic: I will give an overview of data and text mining and present a text mining effort that uses data mining techniques to discover phrase usage trends in documents. The main paper is: B. Lent, R. Agrawal, R. Srikant: "Discovering Trends in Text Databases", Proc. of the 3rd Int'l Conference on Knowledge Discovery in Databases and Data Mining, Newport Beach, California, August 1997.  and can be found along with other papers this work is based on at: http://www.almaden.ibm.com/cs/people/ragrawal/pubs.html

Time: Research group meeting, Friday, May 28, 2:00 pm, DC1331
Speaker: Rudolf Fleischer
Snacks: Reem Al-Halimi
Topic: In the page replication problem, we are given a weighted graph and a start node $s$ which initially contains a page.  Other nodes which want to access this page can do so by sending a request to any other node holding the page.  The cost of this access is then the distance between the two nodes.  A node can also choose to copy the whole page at a cost of $d$ times the distance, where $d$ is usually a large constant (the page size).  If the sequence of requests is not known in advance, we have an online problem so we use competitive analysis to measure the performance of algorithms, i.e., we compare the cost of our algorithm to the cost of the best possible offline algorithm. Various graph topologies (trees, rings, arbitrary graphs) have been studied in this model.

In this talk, we present several variants of the classical page replication problem and give some new upper and lower bounds.  In the {\em continuous page replication problem} we allow requests at and replication to arbitrary points on any edge of the graph.  In this model randomization does not help, and the deterministic algorithms are equivalent to randomized algorithms in the classical discrete model.  In the {\em unequal cost model} we assume that the online algorithm has a replication factor different from $d$, the replication factor of the offline algorithm.  We give optimal deterministic and randomized algorithms for the discrete and continuous variant of this model on trees. From this we can derive much simpler proofs for known algorithms on rings.


Time: Research group meeting, Friday, June 4, 2:00 pm, DC1331
Speaker: Julia Johnson
Snacks: Ian Munro
Topic: Rough Sets for Informative Question Answering
When it is not possible to give a precise answer, it may be possible to give an imprecise answer which is nevertheless informative. The rough set model will be demonstrated for distinguishing between precise answers that say "There are no objects X" or "The objects X are ..." from uncertain answers of the form "Objects X are included in set Y" or "There may be additional objects X other than those in set Y". These uncertain answers offer an improvement over the more traditional precise answers because the system is better able to report on its lack of knowledge. 


Time: Research group meeting, Friday June 11,  2:00 pm, DC1331
Speaker: Forbes Burkowski
Snacks: Ian Munro
Topic: Forbes will be talking about evolutionary computing, and its use in data mining applications.