1 / 15

Iterative Translation Disambiguation for Cross-Language Information Retrieval

Iterative Translation Disambiguation for Cross-Language Information Retrieval. Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : Christof Monz and Bonnie J. Dorr. 2005.SIGIR.520-527. Outline. Motivation Objective Approach Experiment Result Introduction Experiment Conclusions.

kayla
Download Presentation

Iterative Translation Disambiguation for Cross-Language Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Iterative Translation Disambiguation for Cross-Language Information Retrieval Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : Christof Monz and Bonnie J. Dorr 2005.SIGIR.520-527

  2. Outline • Motivation • Objective • Approach • Experiment Result • Introduction • Experiment • Conclusions

  3. Motivation • Many words or phrases in one language can be translated into another language in a number of way, so translation ambiguity is very common ,that impacting the effectiveness of information retrieval. Elfmeter (Soccer) Penalty (English) Strafe (punishment)

  4. Objective • Finding a proper distribution of translation probabilities that can solve the translation ambiguity problem.

  5. europa europe gewerbe geschaeft union handel gewerkschaft union trade Approach • Find a proper of translation probabilities. • Computing Term Weight • Initialization Step • Iteration Step • Normalization Step • All term weights in a vector • Iteration Stop

  6. Measuring association strength Pointwise mutual information Dice coefficient Log Likelihood ratio Approach

  7. Experiment Result baseline Improve Differences Individual queries (topic)

  8. Introduction • Two techniques for cross-language retrieval • Translate collection of document into target language and apply monolingual retrieval • Translate the query into target language and apply translated query retrieval • Three approach may be used produce the translations • Machine translation system • Dictionary • Parallel corpus to estimate the probabilities

  9. Introduction • One language translation into another language in a number ways. • Penalty (English) => Elfmeter (soccer) or Strafe (punishment)

  10. Introduction • A approach can solve the problem of word selection is to use co-occurrences between term. • Problem (a larger number of terms) • Data-sparseness • Use very large corpora for counting co-occruences frequencies • Use internet search engines • Smoothing

  11. Experiment • Test Data • CLEF 2003 English to German bilingual data • Choice 56 topic (title, description, narrative) • Morphological Normalization • Source-language word (topic) normalized to match in bilingual dictionary • De-compounding:5-grams • Assign weights to 5-gram substrings

  12. Experiment • Retrieval Model • Lnu.Itc weighting scheme • Weighted document similarity • Statistical Significance • Bootstrap method • Bootstrap sample • One-tailed significance testing (compare two retrieval method)

  13. Experiment • Found some problem in experiment • Individual average precision of Log Likelihood ratio decreases for a number of query. • Unknown word • The original word from the source language is include in the target language query. • Example • Women’s Conference Beijing Result 1.Woman control document simliarity 2.Most top-ranked documents contain Women as the only matching term. Not find : Woman Women Assign weighted =1 Women (專有名詞) normalized Women

  14. Conclusions • Our approach improve retrieval effectiveness compare to baseline using bilingual dictionary lookup. • Experimental result show that Log Likelihood Ratio has the strong positive impact.

  15. My opinion • Advantage: • It only requires a bilingual dictionary and a monolingual corpus in the target language. • Disadvantage: • Unknown word • Apply

More Related