70 likes | 85 Views
In the GeoCLEF 2007 evaluation, a basic retrieval method using logistic regression with blind feedback was tested to establish baseline data. Results showed potential due to improved machine translation and support for multiple languages. Future steps include refining geographic processing, decompounding for German, and testing various machine translation systems. The goal is to enhance geographic reasoning and exploit geographic data in image retrieval tasks.
E N D
Cheshire at GeoCLEF 2007: Retesting Text Retrieval Baselines Ray R Larson School of Information University of California, Berkeley
Motivation • In previous GeoCLEF evaluations we found very mixed results in using various methods of query expansion, attempts at explicit geographic constraints, etc. • For this year we decided to try just our “basic” retrieval method • I.e., Logistic regression with blind feedback • The goal was to establish baseline data that we can use to test selective additions in later experiments GeoCLEF 2007 -- Budapest
Results • We didn’t expect our “baseline” approach would be as effective as it appears to have been • Some of the success of our runs this year *may* be the use of a new MT system • LEC PowerTranslator - chosen for its support for ALL of the languages in all of the CLEF tasks in which we participate GeoCLEF 2007 -- Budapest
Comparison of Results 2006-2007 GeoCLEF 2007 -- Budapest
What happened in German? • No decompounding • 2006 used Aitao Chen’s decompounding • Worse translation? • Possibly - different MT systems were used • Incomplete stoplist? • Was it really the same? • Was stemming the same? GeoCLEF 2007 -- Budapest
Why did it work? • This is all speculation, but… • Were complex geographic expressions in the queries reflected in usage in the text? • Was the geographic context constrained or implied by the topic? • E.g. Acid rain seems to be more common (or at least more written about) in Northern Europe • Airline pilots seek to avoid populated areas in cases of emergencies that might lead to a crash • We don’t yet seem to be finding queries that require real geographic reasoning GeoCLEF 2007 -- Budapest
What Next? • Start adding back true geographic processing and test where and why (and if) results are improved • Get decompounding working with German • Also test the new MT system versus Babelfish, the L&H system and Promt • Start exploiting the Geographic data in ImageCLEFPhoto GeoCLEF 2007 -- Budapest