Time: |
Research group meeting, Friday May 7, 2:00 pm, DC1331 |

Speaker: |
Khuzaima Daudjee |

Snacks: |
Ian Davis |

Topic: |
I'll talk about the Mariposa distributed database system this Friday. Mariposa was designed and developed at Berkeley by Stonebraker, Aoki, Sah, Staelin, Sidell and a few others. |

Time: |
Research group meeting, Friday May 14, 2:00 pm, DC1331 |

Speaker: |
Reem Al-Halimi |

Snacks: |
Khuzaima Daudjee |

Topic: |
I will give an overview of data and text mining and present a text mining effort that uses data mining techniques to discover phrase usage trends in documents. The main paper is: B. Lent, R. Agrawal, R. Srikant: "Discovering Trends in Text Databases", Proc. of the 3rd Int'l Conference on Knowledge Discovery in Databases and Data Mining, Newport Beach, California, August 1997. and can be found along with other papers this work is based on at: http://www.almaden.ibm.com/cs/people/ragrawal/pubs.html |

Time: |
Research group meeting, Friday, May 28, 2:00 pm, DC1331 |

Speaker: |
Rudolf Fleischer |

Snacks: |
Reem Al-Halimi |

Topic: |
In the page replication problem, we are given a weighted graph and
a start node $s$ which initially contains a page. Other nodes which
want to access this page can do so by sending a request to any other node
holding the page. The cost of this access is then the distance between
the two nodes. A node can also choose to copy the whole page at a
cost of $d$ times the distance, where $d$ is usually a large constant (the
page size). If the sequence of requests is not known in advance,
we have an online problem so we use competitive analysis to measure the
performance of algorithms, i.e., we compare the cost of our algorithm to
the cost of the best possible offline algorithm. Various graph topologies
(trees, rings, arbitrary graphs) have been studied in this model.
In this talk, we present several variants of the classical page replication problem and give some new upper and lower bounds. In the {\em continuous page replication problem} we allow requests at and replication to arbitrary points on any edge of the graph. In this model randomization does not help, and the deterministic algorithms are equivalent to randomized algorithms in the classical discrete model. In the {\em unequal cost model} we assume that the online algorithm has a replication factor different from $d$, the replication factor of the offline algorithm. We give optimal deterministic and randomized algorithms for the discrete and continuous variant of this model on trees. From this we can derive much simpler proofs for known algorithms on rings. |

Time: |
Research group meeting, Friday, June 4, 2:00 pm, DC1331 |

Speaker: |
Julia Johnson |

Snacks: |
Ian Munro |

Topic: |
Rough Sets for Informative Question Answering
When it is not possible to give a precise answer, it may be possible to give an imprecise answer which is nevertheless informative. The rough set model will be demonstrated for distinguishing between precise answers that say "There are no objects X" or "The objects X are ..." from uncertain answers of the form "Objects X are included in set Y" or "There may be additional objects X other than those in set Y". These uncertain answers offer an improvement over the more traditional precise answers because the system is better able to report on its lack of knowledge. |

Time: |
Research group meeting, Friday June 11, 2:00 pm, DC1331 |

Speaker: |
Forbes Burkowski |

Snacks: |
Ian Munro |

Topic: |
Forbes will be talking about evolutionary computing, and its use in data mining applications. |