1 / 34

Multi-document summarization using A* search and discriminative training

Multi-document summarization using A* search and discriminative training. Ahmet Aker Trevor Cohn Robert Gaizauskas Department of Computer Science University of Sheffield, Sheffield, S1 4DP, UK { a.aker , t.cohn , r.gaizauskas }@dcs.shef.ac.uk. Reference:

Download Presentation

Multi-document summarization using A* search and discriminative training

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-document summarization using A* search and discriminative training Ahmet Aker Trevor Cohn Robert Gaizauskas Department of Computer Science University of Sheffield, Sheffield, S1 4DP, UK {a.aker, t.cohn, r.gaizauskas}@dcs.shef.ac.uk Reference: AhmetAker, Trevor Cohn, Robert J. Gaizauskas Multi-Document Summarization Using A* Search and Discriminative Learning. EMNLP 2010: 482-491

  2. Outline • Introduction • Summarization Model • A* Search • Training • Experimental results • Conclusion

  3. Introduction • Multi-document summarization aims to presentmultiple documents in form of a short summary • Most multi-document summarization systems define a model which assigns a score to a candidate summarybased on the features of the sentences included in the summary

  4. a short summary multi-document summarization system

  5. Introduction • The research challenges are then twofold: • 1) the search problem of finding the best scoring summary for a given document set • 2) the training problem of learning the model parameters to best describe a training set consisting of pairs of document sets with model or reference summaries

  6. Introduction • Search is typically performed by a greedy algorithm which selects each sentence in decreasing order of model score until the desired summary length is reached • We show in this paper that the search problem can be solved optimally and efficiently using A* search

  7. Introduction • Framing summarization as search suggests that many of the popular training techniques are maximizing the wrong objective • These approaches train a classifier, regression or ranking model to distinguish between good and bad sentences under an evaluation metric, e.g., ROUGE • The model is then used during search to find a summary composed of high scoring (‘good’) sentences

  8. Introduction • However, there is a disconnect between the model used for training and the model used for prediction • We present a solution to this disconnect in the form of a training algorithm that optimizes the full prediction model directly with the search algorithm intact • The training algorithm learns parameters such that the best scoring whole summary under the model has a high score under the evaluation metric

  9. Summarization Model • Extractivemulti-document summarization aims to find the most important sentences from a set of documents, which are then collated and presented to the user in form of a short summary • Extractive vs Abstractive summarization

  10. Summarization Model • we define a linear model which scores summaries as the weighted sum of their features • x is the document set, composed of ksentences are the set of selected sentenceindices is a feature function which returns a vector of featuresfor the candidate summary are the model parameters (1)

  11. Summarization Model • Assume that the features decompose with the sentences in the summary, and therefore the scoring function also decomposes along the same lines • Under this model, the search problem is to solve for which we develop a best-first algorithm using A* search. (2) (3)

  12. A* Search • The prediction problem is to find the best scoring extractive summary up to a given length, L • This appears to be a simple problem that might be solved efficiently with a greedy algorithm, say by taking the sentences in order of decreasing score and stopping just before the summary exceeds the length threshold. • However, the greedy algorithm cannot be guaranteed to find the best summary; to do so requires arbitrary backtracking to revise previous incorrect decisions

  13. A* Search • The problem of constructing the summary can be considered a search problem • The search graph starts with an empty summary (the starting state) and each outgoing edge adds a sentence to produce a subsequent state, and is assigned a score under the model • A goal state is any state with no more words than the given threshold.

  14. A* Search 2 • (中央社記者吳佳穎台北12日電)宏達電今天在台灣率先推出2款微軟芒果機,並說芒果機第4季表現可期;另外,宏達電也對北亞市場深具信心,表示未來3年內,將成為除韓國之外,智慧手機市占第1的品牌… 志在一般消費者和商務人士 1 這兩款手機主打社群功能整合 宏達電今天在台灣率先推出2款微軟芒果機 • (台灣醒報記者鍾禎祥台北報導)醒報記者鍾禎祥台北報導】微軟今天在台灣一口氣發表兩款搭載「芒果」系統的新智慧手機,企圖在iPhone和谷歌Android平台手機夾攻下殺出一條血路。這兩款手機主打社群功能整合,使用者能隨時看到推特、臉書、MSN、E-Mail、電話號碼的動態,另外還內建78款XboxLive遊戲和完整Office功能,志在一般消費者和商務人士。… 2 Starting state 微軟今天在台灣一口氣發表兩款搭載「芒果」系統的新智慧手機 2 並說芒果機第4季表現可期 1 智慧手機市占第1的品牌… 2

  15. A* Search • The summarisation problem is then equivalent to finding the best scoring path (summed over the edge scores) between the start state and a goal state • A* is a best-first search algorithm which can efficiently find the best scoring path or the n-best paths (unlike the greedy algorithm which is not optimal, and the backtracking variant which is not efficient).

  16. A* Search • The search procedure requires • a scoring function for each state • a heuristic function which estimates the additional score to get from a given state to a goal state • For the search to be optimal – guaranteed to find the best scoring path as the first solution – the heuristic must be admissible, meaning that it bounds from above the score for reaching a goal state

  17. A* Search • we now return to the problem of defining the heuristic function, h(y; x, l) which provides an upper bound on the additional score achievable in reaching a goal state from state y

  18. A* Search-h1() • algorithm 2 simply finds the maximum score per word from the set of unused sentences and then extrapolates this out over the remaining words available to the length threshold • we can ‘reuse’ a high scoring short sentence many times despite this being disallowed by the model.

  19. A* Search-h2() • An improved bound h2incrementally adds each sentence in order of its score-per-word until the length limit is reached • If the limit is to be exceeded, the heuristic scales down the final sentence’s score based on the fraction of words than can be used to reach the limit • The fractional usage of the final sentence in h2 could be considered overly optimistic, especially when the state has length just shy of the limit L

  20. A* Search-h2()

  21. A* Search-h3() • If the next best ranked sentence is a long one, then it will be used in the heuristic to over-estimate of the state • This is complicated to correct, and doing so exactly would require full backtracking which is intractable and would obviate the entire point of using A* search • Instead we use a subtle modification in h3 which is equivalent to h2 except in the instance where the next best score/word sentence is too long, where it skips over these sentences until it finds the best scoring sentence that does fit

  22. A* Search-h3()

  23. A* Search • Example of the A* search graph created to find the two top scoring summaries of length 7

  24. A* Search • The search procedure requires a scoring function for each state, here s(y|x) from (2), and

  25. Efficiency of A* search search • for each document set generated the 100-best summaries with word limit L = 200 • Surprisingly the aggregated heuristic, h2, is not considerably more efficient than the uniform heuristic h1, despite bounding the cost more precisely

  26. Training • We frame the training problem as one of finding model parameters, , such that the predicted output, closely matches the gold standard, r(abstractive) • We adopt the standard machinelearning terminology of loss functions, which measurethe degree of error in the prediction,

  27. Training • In our case the accuracy is measured by the ROUGEscore, R, and the loss is simply 1 - R. The trainingproblem is to solve • We use theA* search algorithm to construct these n-best lists,and use MERT to optimise the ROUGE score on thetraining set for the R-1, R-2 and R-SU4 variants ofthe metric (4)

  28. Experimental results • Training data • MERT is maximizing the metric for which it was trained

  29. Experimental results • in domain data

  30. Experimental results • out of domain data

  31. Experimental results • From Table 5 and 6 we can see that the summaries obtained from Virtual Tourist captions (in domain data) score roughly the same as the summaries generated using web-documents (out of domain data) as input. • A possible explanation is that in many cases the VirtualTourist original captions contain text from Wikipedia articles, which are also returned as results from the web search

  32. Results • Manual Evaluation

  33. Conclusion • we have proposed an A* search approach for generating a summary from a ranked list of sentences and learning feature weights for a feature based extractive multi-document summarization system • In this paper we experimented with sentence-local features. In the future we plan to expand this feature set with global features, especially ones measuring lexical diversity in the summaries to reduce the redundancy in them

More Related