1 / 26

A General Optimization Framework for Smoothing Language Models on Graph Structures

A General Optimization Framework for Smoothing Language Models on Graph Structures. Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign. Doc Language Model (LM) θ d : p(w|  d ). text 4/100=0.04 mining 3/100=0.03 clustering 1/100=0.01 …

noura
Download Presentation

A General Optimization Framework for Smoothing Language Models on Graph Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign

  2. Doc Language Model (LM) θd : p(w|d) text 4/100=0.04 mining 3/100=0.03 clustering 1/100=0.01 … data = 0computing = 0… text =0.039 mining =0.028 clustering =0.01 … data = 0.001computing = 0.0005… Similarity function Query Language Model θq : p(w|q) p(w|q’) ? Data ½=0.5 Mining ½=0.5 Data ½=0.4 Mining ½=0.4 Clustering =0.1 … Kullback-Leibler Divergence Retrieval Method Smoothed Doc LM θd' : p(w|d’) Document d A text mining paper Query q data mining

  3. Smoothing a Document Language Model Retrieval performance  estimate LM  smoothing LM text 4/100 = 0.04 mining 3/100 = 0.03 Assoc. 1/100 = 0.01 clustering 1/100=0.01 … data = 0computing = 0… Estimate a more accurate distribution from sparse data text = 0.039 mining = 0.028 Assoc. = 0.009 clustering =0.01 … data = 0.001computing = 0.0005… text = 0.038 mining = 0.026 Assoc. = 0.008 clustering =0.01 … data = 0.002computing = 0.001… Assign non-zero prob. to unseen words

  4. d d d Previous Work on Smoothing Estimate a Reference language model θref using the collection (corpus) Collection [Ponte & Croft 98] Clusters [Liu & Croft 04] Nearest Neighbors Interpolate MLE with Reference LM [Kurland& Lee 04] 4

  5. Problems of Existing Methods • Smoothing with Global Background • Ignoring collection structure • Smoothing with Document Clusters • Ignoring local structures inside cluster • Smoothing using Neighbor Documents • Ignoring global structure • Different heuristics on θref and interpolation • No clear objective function for optimization • No guidance on how to further improve the existing methods

  6. Research Questions • What is the right corpus structure to use? • What are the criteria for a good smoothing method? • Accurate language model? • What are we ending up optimizing? • Could there be a general optimization framework?

  7. Our Contribution • Formulation of smoothing as optimization over graph structures • A general optimization framework for smoothing both document LMs and query LMs • Novel instantiations of the framework lead to more effective smoothing methods

  8. d A Graph-based Formulation of Smoothing • A novel and general view of smoothing Collection Collection = Graph (of Documents) P(w|d) = Surface on top of the Graph P(w|d): MLE P(w|d): Smoothed P(w|d1) projection on a plain P(w|d2) Smoothed LM = Smoothed Surface! d1 d2

  9. Covering Existing Models Smoothing with Global Background - Star graph C4 Background Smoothing with Graph Structure C1 d C3 Smoothing with Nearest Neighbor - Local Graph C2 Smoothing with Document Clusters - Forest w/ Pseudo docs Collection = Graph Smoothed LM = Smoothed Surfaces

  10. Instantiations of the Formulation Document Graphs

  11. w Smoothing over Word Graphs P(wu|d)/Deg(u) Given d, {P(w|d)} = Surface over the word graph! Similarity graph of words Smoothed LM = Smoothed Surface! P(wu|d) P(wv|d)

  12. The General Objective of Smoothing Fidelity to MLE Smoothness of the surface Importance of vertices - Weights of edges (1/dist.) 12

  13. The Optimization Framework • Criteria: • Fidelity: keep close to the MLE • Surface Smoothness: local and global consistency • Constraint: • Unified optimization objective: Fidelity to MLE Smoothness of the surface

  14. The Procedure of Smoothing Define graph Construct a document/word graph; Define reasonable w(u) and w(u,v); Define surfaces Define reasonable fu d Smooth surfaces Iterative updating Additional Dirichlet Smoothing

  15. Smoothing Language Models using a Document Graph Construct a kNN graph of documents; w(u): Deg(u) w(u,v): cosine fu= p(w|du); or fu= s(q, du); Document language model: d Alternative: Document relevance score: e.g., (Diaz 05) Additional Dirichlet Smoothing

  16. Smoothing Language Models using a Word Graph Construct a kNN graph of words; w(u): Deg(u) w(u,v): PMI fu= Document language model: w Query Language Model Additional Dirichlet Smoothing

  17. Intuitive Interpretation – Smoothing using Word Graph Stationary distribution of a Markov Chain w Writing a document = random walk on the word Markov chain; write down w whenever passing w w

  18. Intuitive Interpretation – Smoothing using Document Graph Absorption Probability to the “1” state d Writing a word w in a document = random walk on the doc Markov chain; write down w if reaching “1” 1 0 d Act as neighbors do

  19. Experiments Liu and Croft ’04 Tao ’06 • Smooth Document LM on Document Graph (DMDG) • Smooth Document LM on Word Graph (DMWG) • Smooth relevance Score on Document Graph (DSDG) • Smooth Query LM on word graph (QMWG) • Evaluate using MAP

  20. Effectiveness of the Framework Wilcoxon test: *, **, *** means significance level 0.1, 0.05, 0.01 † DMWG: reranking top 3000 results. Usually this yields to a reduced performance than ranking all the documents Graph-based smoothing >> Baseline Smoothing Doc LM >> relevance score >> Query LM

  21. Comparison with Existing Models Graph-based smoothing > state-of-the-art More iterations > Single iteration (similar to DELM)

  22. Combined with Pseudo-Feedback w q smooth Top docs rerank d smooth w

  23. Related Work • Language modeling in Information Retrieval; smoothing using collection model • (Ponte & Croft 98); (Hiemstra & Kraaij 98); (Miller et al. 99); (Zhai & Lafferty 01), etc. • Smoothing using corpus structures • Cluster structure: (Liu & Croft 04), etc. • Nearest Neighbors: (Kurland & Lee 04), (Tao et al. 06) • Relevance score propagation (Diaz 05), (Qin et al. 05) • Graph-based learning • (Zhu et al. 03); (Zhou et al. 04), etc.

  24. Conclusions • Smoothing language models using document/word graphs • A general optimization framework • Various effective instantiations • Improved performance over state-of-the-art • Future Work: • Combine document graphs with word graphs • Study alternative ways of constructing graphs

  25. Thanks!

  26. Parameter Tuning Fast Convergence

More Related