Cheshire at GeoCLEF 2007: Retesting Text Retrieval Baselines

Cheshire at GeoCLEF 2007: Retesting Text Retrieval Baselines Ray R Larson School of Information University of California, Berkeley

Motivation • In previous GeoCLEF evaluations we found very mixed results in using various methods of query expansion, attempts at explicit geographic constraints, etc. • For this year we decided to try just our “basic” retrieval method • I.e., Logistic regression with blind feedback • The goal was to establish baseline data that we can use to test selective additions in later experiments GeoCLEF 2007 -- Budapest

Results • We didn’t expect our “baseline” approach would be as effective as it appears to have been • Some of the success of our runs this year *may* be the use of a new MT system • LEC PowerTranslator - chosen for its support for ALL of the languages in all of the CLEF tasks in which we participate GeoCLEF 2007 -- Budapest

Comparison of Results 2006-2007 GeoCLEF 2007 -- Budapest

What happened in German? • No decompounding • 2006 used Aitao Chen’s decompounding • Worse translation? • Possibly - different MT systems were used • Incomplete stoplist? • Was it really the same? • Was stemming the same? GeoCLEF 2007 -- Budapest

Why did it work? • This is all speculation, but… • Were complex geographic expressions in the queries reflected in usage in the text? • Was the geographic context constrained or implied by the topic? • E.g. Acid rain seems to be more common (or at least more written about) in Northern Europe • Airline pilots seek to avoid populated areas in cases of emergencies that might lead to a crash • We don’t yet seem to be finding queries that require real geographic reasoning GeoCLEF 2007 -- Budapest

What Next? • Start adding back true geographic processing and test where and why (and if) results are improved • Get decompounding working with German • Also test the new MT system versus Babelfish, the L&H system and Promt • Start exploiting the Geographic data in ImageCLEFPhoto GeoCLEF 2007 -- Budapest

Cheshire at GeoCLEF 2007: Retesting Text Retrieval Baselines