1 / 37

T-Scroll : Visualizing Trends in a Time-Series of Documents for Interactive User Exploration

T-Scroll : Visualizing Trends in a Time-Series of Documents for Interactive User Exploration. Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University, Japan ishikawa@itc.nagoya-u.ac.jp. Outline. Background and objective Related work Novelty-based document clustering

Download Presentation

T-Scroll : Visualizing Trends in a Time-Series of Documents for Interactive User Exploration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University, Japan ishikawa@itc.nagoya-u.ac.jp

  2. Outline • Background and objective • Related work • Novelty-based document clustering • Overview of T-Scroll system • Evaluation • Conclusions and future work

  3. Background • Time-series of documents • Example: news articles delivered on the Internet, online academic journals • Continually delivered everyday • Problems • A large number of documents: appropriate summarization is required • Topics will change: topic detection/tracking and trend extraction are useful

  4. Objectives • Development and evaluation of T-Scroll (Trend/Topic-Scroll) • User interface for visualizing the transition of topics extracted from a time-series documents • System Features • Constructed over a document clustering system that outputs new clustering results periodically • Clusters are displayed along the time axis like a scroll • Links are shown between related clusters to represent topic transition • Some useful features for interactive exploratory analysis

  5. Outline • Background and objective • Related work • Novelty-based document clustering • Overview of T-Scroll system • Evaluation • Conclusions and future work

  6. Visualization of a time-series of documents • A few systems for visualization of trends in a time-series of documents • ThemeRiver (Havre et al, IEEE Trans. VCG, 2002) [4] • Visualizes topic streams like a river • Focuses on providing visual impacts • No features for analysis and browsing • TimeMine (Swan and Allan, SIGIR’00) [5] • Extracts topics from a time-series of documents • Displays timelines to represent topics on the screen

  7. ThemeRiver Analysis of the articles related to Cuba (1960 – 1961)

  8. TimeMine • Swan & Allan (U. of Massachusetts)

  9. Analysis of time-dependent clusters • Mei & Zhai (KDD’05) [6] • Statistical approach for discovering major topics from a time-series of documents • Probabilistic modeling • MONIC (Spiliopoulou et al., KDD’06) [7] • Detects various types of patterns from cluster transitions • Examples: splitting/merging of clusters, cluster size changes • Based on the analysis of historical snapshots of clusters

  10. Outline • Background and objective • Related work • Novelty-based document clustering • Overview of T-Scroll system • Evaluation • Conclusions and future work

  11. Novelty-based document clustering (1) • Developed by our group (ECDL’01 [8], WWW Journal 2007 [10] etc.) • Clusters documents incrementally based on their similarity and novelty • Features • Similarity considers novelty • Assign high weights to recent documents, low weights to old ones • Document weights decay as time passes: Based on the concept of obsolescence (aging) • Delete old documents whose weights are smaller than the threshold • Incremental processing: low update cost

  12. Novelty-based document clustering (2) • Periodical clustering processes are performed on a time-series of documents “Yeltsin’s Death” and other documents are obsolete! Blair to Resign New President Sarkozy Yeltsin’s Death Other articles time

  13. Document similarity (1) dwi 1 t t Ti Current time acquisition time of document di • Assumption: each delivered document gradually loses its value as time passes • dwi: the weightof a documentdi at time  • (0 <  < 1): forgetting factor determines the forgetting speed • The weight of a document exponentially decreases as time passes.

  14. Document similarity (2) • Similarity score of documents di and dj • Based on novelty of documents and word occurrence patterns in the documents. • Extension of the tf-idf method • New documents have high impact on the clustering result • Document clustering: k-means method

  15. Outline • Background and objective • Related work • Novelty-based document clustering • Overview of T-Scroll system • Evaluation • Conclusions and future work

  16. T-Scroll: Idea • Periodical clustering results are displayed like a scroll • Links represents related cluster pairs

  17. System functionalities (1) • Cluster labels: selected based on the formula • Pr(di): document weight, tfij: term frequency count • Cluster sizes: ellipse size roughly corresponds to the number of documents • Links: If the score is greater than the threshold, links are shown

  18. System functionalities (2) • Cluster quality: visualized using different colors for the cluster border lines • red (good) purple (bad) • High score can be achieved if (1) the cluster size is large, and (2) documents contained in the cluster are similar

  19. System functionalities (3) • Drill-down/roll-up: user can specify the interval of between two consecutive clustering interactively (e.g., one day, one week) • Displaying keyword list: user can browse the keyword list for a specified cluster • Access to original documents • Keyword-based emphasis: clusters that contain a user-specified keyword are emphasized

  20. Demo

  21. System implementation • T-Scroll module • Written by Perl: generates an SVG file • Browser displays the generated SVG file • SVG file includes scripts (JavaScript) • Used for interactive manipulation • Clustering module • Written by Ruby • Novelty-based incremental document clustering

  22. System architecture ClusteringModule RSSFeedModule ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- Input Output Clustering result News articles Input T-Scroll Main Module SVGOutput Module Command inputs (Perl) Interactivemanipulation (JavaScript) Outputs ------- ------- ------- SVGControlModule Plug-in Cluster display Browser User (Perl) SVG file (includes JavaScript) T-Scroll

  23. Outline • Background and objective • Related work • Novelty-based document clustering • Overview of T-Scroll system • Evaluation • Conclusions and future work

  24. Evaluation • 10 Users • Data set • Japanese news articles collected from news web sites from Sept. 2006 to Feb. 2007 • 100 articles per day • Clustering was performed at six-hour intervals • Evaluation criteria • Overall impressions • Evaluation of each function • Obervability of topics • Comparison with ThemeRiver

  25. Overall impression • User specifies scores between 0 to 5

  26. Evaluation on each function

  27. Observability of topics (1) • Can users observe major topics in Nov. 2006? • Five major topics are specified by ours: user gives scores how clearly he or she can observe the topic

  28. 10 users (different from former experiments) Users should reply observed topics and their scores with no information Topics 1 to 5 are major topics used in the previous experiments Topic 2 (big hurricane) was regarded as a normal weather topic Observability of topics (2)

  29. Comparison with ThemeRiver (1) • ThemeRiver-like display figure was manually created for news articles in Dec. 2006 • 11 users (different from previous experiments) • Questions to users • Overall impressions • Obserbability of topics

  30. Comparison with ThemeRiver (2) • Overall impression

  31. Comparison with ThemeRiver (2) • Can users observe five major topics that we selected?

  32. Summary of experiments • Overall impressions • Good, but improvements required for usability • Some users made comments on the response speed • System functionalities • Several features (quality info, article lists, etc.) are useful in practice • Appropriate labels are necessary: should be improved • Comparison with ThemeRiver • ThemeRiver has visual impacts, but its display tends to be complicated for many topics

  33. Outline • Background and objective • Related work • Novelty-based document clustering • Overview of T-Scroll system • Evaluation • Conclusions and future work

  34. Conclusions and future work • Development and evaluation of T-Scroll system • Based on novelty-based incremental clustering method • Scroll-like display for showing changing trends • Several features for interactive analysis • Evaluation • Overall impression • Observability of topics • Comparison with ThemeRiver • Future work • Sophisticated keyword (label) selection • Improvement of interactive speed

More Related