1 / 15

HCC class lecture 14 comments

Learn about Latent Semantic Analysis (LSA) limitations and explore Non-Negative Matrix Factorization (NMF) as an alternative for theme extraction in text clustering. Understand how NMF components track themes effectively and the significance of using KL-divergence in NMF for clustering tasks.

rsalcido
Download Presentation

HCC class lecture 14 comments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HCC classlecture 14 comments John Canny3/9/05

  2. Administrivia

  3. Clustering: LSA again • The input is a matrix. Rows represent text blocks (sentences, paragraphs or documents) • Columns are distinct terms • Matrix elements are term counts (x tfidf weight) • The idea is to “Factor” this matrix into A D B: Themes Terms Terms D B = Textblocks M A Textblocks

  4. LSA again • A encodes the representation of each text block in a space of themes. • B encodes each theme with term weights. It can be used to explicitly describe the theme. Themes Terms Terms D B = Textblocks M A Textblocks

  5. LSA limitations • LSA has a few assumptions that don’t make much sense: • If documents really do comprise different “themes” there shouldn’t be negative weights in the LSA matrices. • LSA implicitly models gaussian random processes for theme and word generation. Actual document statistics are far from gaussian. • SVD forces themes to be orthogonal in the A and B matrices. Why should they be?

  6. Non-negative Matrix Factorization • NMF deals with non-negativity and orthogonality, but still uses gaussian statistics: • If documents really do comprise different “themes” there shouldn’t be negative weights in the LSA matrices. • LSA implicitly models gaussian random processes for theme and word generation. Actual document statistics are far from gaussian. • SVD forces themes to be orthogonal in the A and B matrices. Why should they be?

  7. LSA again • The consequences are: • LSA themes are not meaningful beyond the first few (the ones with strongest singular value). • LSA is largely insensitive to the choice of semantic space (most 300-dim spaces will do).

  8. NMF • The corresponding properties: • NMF components track themes well (up to 30 or more). • The NMF components can be used directly as topic markers, so the choice is important.

  9. NMF • NMF is an umbrella term for several algorithms. • The one in this paper uses least squares to match the original term matrix. i.e. it minimizes: (M – AB)2 • Another natural metric is the KL or Kullback-Liebler divergence.The KL-divergence between two probability distributions p and q is:  p log p/q • Another natural version of NMF uses KL-divergence between M and its approximation as A B.

  10. NMF • KL-divergence is usually a more accurate way to compare probability distributions. • However, in clustering applications, the quality of fit to the probability distribution is secondary to the quality of the clusters. • KL-divergence NMF performs well for smoothing (extrapolation) tasks, but not as well as least-squares for clustering. • The reasons are not entirely clear, but it may simply be an artifact of the basic NMF recurrences, which find only locally-optimal matches.

  11. A Simpler Text Summarizer • A simpler text summarizer based on inter-sentence analysis did as well as any of the custom systems on the DUC-2002 dataset (Document Understanding Conference). • This algorithm called “TextRank” was based on a graphical analysis of the similarity graph between sentences in the text.

  12. A Simpler Text Summarizer • Vertices in the graph represent sentences, edge weights are similarity between sentences: S1 S2 S7 S3 S6 S4 S5

  13. Textrank • TextRank computes vertex strength using a variant of Google’s Pagerank. It gives the probability of being at a vertex during a long random walk on the graph. S1 S2 S7 S3 S6 S4 S5

  14. Textrank • The highest-ranked vertices comprise the summary. • Textrank achieved the same summary performance as the best single-sentence summarizers at DUC-2002. (TextRank appeared in ACL 2004)

  15. Discussion Topics T1: The best text analysis algorithms for a variety of tasks seem to use numerical (BOW or graphical models) of texts. Discuss what information these representations capture and why they might be effective.

More Related