1 / 22

Data-driven Paraphrasing

Data-driven Paraphrasing. Paraphrases. Pair of sequences of words, both in the same language, that have the same meaning in at least some contexts . I spilled the beans and told Jacky I loved her. I exposed my secret and told Jacky I loved her. Beijing’s policy toward Taiwan.

glenys
Download Presentation

Data-driven Paraphrasing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data-driven Paraphrasing

  2. Paraphrases • Pair of sequences of words, both in the same language, that have the same meaning in at least some contexts I spilled the beans and told Jacky I loved her I exposed my secret and told Jacky I loved her Beijing’s policy toward Taiwan China’s policy toward Taiwan

  3. Paraphrase levels Word level (synonym) Phrase (sub-sentential) Sentence househome I spilled the beans  I exposed my secret Beijing’s policy toward Taiwan remains unchanged  China did not change its policy toward Taiwan

  4. Paraphrase types • Structural paraphrases (She ate the applevs. The apple was eaten by her) • Lexical paraphrases (My horse galloped away vs. My mount galloped away) • Phrasal paraphrases(I don’t have enough money to buy this yachtvs. I can’t afford this yacht) • Idiomatic paraphrases (Ispilled the beans paired vs. I exposed the secret) • Referential paraphrases(Tuesdayvs. The day before Wednesday)

  5. Textual entailment • The meaning of a target textual assertion (hypothesis, H) is inferred from a given text (T)? • Paraphrase is a special case of textual entailment, where each sequence entails the other TH Fire bombs were thrown at the Tunisian embassy in Bern The Tunisian embassy in Switzerland was attacked Borrowed from Mirkin, MTML 2011 T H

  6. Applications • Machine translation • Question answering • Information retrieval • …

  7. Motivation

  8. Motivation

  9. Paraphrasing techniques • Two main dimensions corpus type paraphrase level

  10. Corpora types • Monolingual corpus • Monolingual parallel corpus • Monolingual corpus of comparable documents • Bilingual parallel corpus • Bilingual corpus of comparable documents

  11. Corpus-based translation أنا هنا اليوم لأشارككم رحلة غير عادية -- رحلة غير عادية مجزية، في الواقع -- التي جعلتني ادرب الجرذان لإنقاذ حياة الناس عن طريق الكشف عن الألغام الأرضية والسل. عندما كان طفلا، كنت مولع بشيئين. كان أحدهما القوارض. كان عندي جميع أنواع القوارض، الفئران ، الهامستر، الجرابيع، السناجب. سمها ما شئت، اربيها ، وأبيعها لمحلات بيع الحيوانات الأليفة. )ضحك(كما كان لي شغف بأفريقيا. نشأت في بيئة متعددة الثقافات، كان لدينا طلبة أفارقة في المنزل ، وتعلمت قصصهم، [مثل] خلفيات مختلفة، الاعتماد على الدراية المستوردة، السلع والخدمات، التنوع الثقافي الغزير. كانت رائعة حقا أفريقيا بالنسبة لي… אני כאן בכדי לחלוק איתכם במסע מדהים -- במסע מדהים ומתגמל, למען האמת אשר הוביל אותי לאמן חולדות להצלת חיים באמצעות גילוי של מוקשים וגילוי של שחפת. כילד, היו שני נושאים שהלהיבו אותי אחד היה אהבה למכרסמים היו לי סוגים שונים של חולדות עכברים, אוגרים גרבילים, סנאים תנקבו בשם של מכרסם, אני גידלתי אותו, ומכרתי אותם לחנויות חיות מחמד. )צחוק(הייתה לי גם משיכה לגבי אפריקה גדלנו בסביבה רב תרבותית, והיו לנו סטודנטים אפריקאים בבית, ואני למדתי מהסיפורים שלהם [כגון] הרקעים השונים שמהם באו, תלות בידע מיובא, טובין, שירותים, רב-תרבותית חיונית. אפריקה באמת ריתקה אותי… Bi-lingual texts

  12. Europarl corpus

  13. Related works

  14. Bannard & Callison-Burch (2005) what is more, the relevant cost dynamic is completely under control we owe it to the taxpayers to keep the costs in check im übrigen ist die diesbezüglichekostenentwicklung völlig unter kontrolle wir sind es den steuerzahlern schuldig die kosten unter kontrolle zu haben English Spanish French

  15. Bannard & Callison-Burch (2005) - Pivoting • Paraphrase score for is given by: • With calculated by a maximum likelihood settings, e.g.: Results: ~70% correct (over 289 tested phrases)

  16. Bannard & Callison-Burch (2005) - Pivoting • Performance depends on the pivoting language • For example: English => Arabic (Madnani and Dorr, 2010) generates manay paraphrases of different inflected forms(e.g., caused clouds vs. causing clouds)

  17. Modified Arabic paraphrase definition (our work) We include among paraphrases pairs of phrases that express the same meaning, regardless of their inflection for number, gender, and person And we show that they improve Arabic => English machine translation Bar and Dershowitz in CICLING 2014

  18. Paraphrase patterns • Semantically equivalent patterns; a pattern generally contains two parts: words and slots X solves Y Y is solved by X X finds a solution to Y

  19. Zhao et al. (2008) – Pivoting for paraphrase patterns • Use pivoting on dependency-parsed corpora • Extract patterns by treating complete paths as variables NN

  20. Zhao et al. (2008) – Pivoting for paraphrase patterns • Align with the pivot target-language pattern (in their original work: Chinese)

  21. Zhao et al. (2008) – Pivoting for paraphrase patterns • 5 types of extracted patterns

  22. Marton et al. (2009)–distributional similarity • Large monolingual corpus • Using cosine similarity of the distributional profile (DP)of the candidate phrases English Spanish Chinese

More Related