1 / 7

Cheshire at GeoCLEF 2007: Retesting Text Retrieval Baselines

In the GeoCLEF 2007 evaluation, a basic retrieval method using logistic regression with blind feedback was tested to establish baseline data. Results showed potential due to improved machine translation and support for multiple languages. Future steps include refining geographic processing, decompounding for German, and testing various machine translation systems. The goal is to enhance geographic reasoning and exploit geographic data in image retrieval tasks.

jbreaux
Download Presentation

Cheshire at GeoCLEF 2007: Retesting Text Retrieval Baselines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cheshire at GeoCLEF 2007: Retesting Text Retrieval Baselines Ray R Larson School of Information University of California, Berkeley

  2. Motivation • In previous GeoCLEF evaluations we found very mixed results in using various methods of query expansion, attempts at explicit geographic constraints, etc. • For this year we decided to try just our “basic” retrieval method • I.e., Logistic regression with blind feedback • The goal was to establish baseline data that we can use to test selective additions in later experiments GeoCLEF 2007 -- Budapest

  3. Results • We didn’t expect our “baseline” approach would be as effective as it appears to have been • Some of the success of our runs this year *may* be the use of a new MT system • LEC PowerTranslator - chosen for its support for ALL of the languages in all of the CLEF tasks in which we participate GeoCLEF 2007 -- Budapest

  4. Comparison of Results 2006-2007 GeoCLEF 2007 -- Budapest

  5. What happened in German? • No decompounding • 2006 used Aitao Chen’s decompounding • Worse translation? • Possibly - different MT systems were used • Incomplete stoplist? • Was it really the same? • Was stemming the same? GeoCLEF 2007 -- Budapest

  6. Why did it work? • This is all speculation, but… • Were complex geographic expressions in the queries reflected in usage in the text? • Was the geographic context constrained or implied by the topic? • E.g. Acid rain seems to be more common (or at least more written about) in Northern Europe • Airline pilots seek to avoid populated areas in cases of emergencies that might lead to a crash • We don’t yet seem to be finding queries that require real geographic reasoning GeoCLEF 2007 -- Budapest

  7. What Next? • Start adding back true geographic processing and test where and why (and if) results are improved • Get decompounding working with German • Also test the new MT system versus Babelfish, the L&H system and Promt • Start exploiting the Geographic data in ImageCLEFPhoto GeoCLEF 2007 -- Budapest

More Related