1 / 79

Indexing and Data Mining in Multimedia Databases

Indexing and Data Mining in Multimedia Databases. Christos Faloutsos CMU www.cs.cmu.edu/~christos. Outline. Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search New tools for Data Mining: Fractals Conclusions Resources. Problem.

darrion
Download Presentation

Indexing and Data Mining in Multimedia Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU www.cs.cmu.edu/~christos

  2. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • New tools for Data Mining: Fractals • Conclusions • Resources C. Faloutsos

  3. Problem Given a large collection of (multimedia) records, find similar/interesting things, ie: • Allow fast, approximate queries, and • Find rules/patterns C. Faloutsos

  4. Sample queries • Similarity search • Find pairs of branches with similar sales patterns • find medical cases similar to Smith's • Find pairs of sensor series that move in sync • Find shapes like a spark-plug C. Faloutsos

  5. Sample queries –cont’d • Rule discovery • Clusters (of branches; of sensor data; ...) • Forecasting (total sales for next year?) • Outliers (eg., unexpected part failures; fraud detection) C. Faloutsos

  6. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • New tools for Data Mining: Fractals • Conclusions • related projects @ CMU and resourses C. Faloutsos

  7. Indexing - Multimedia Problem: • given a set of (multimedia) objects, • find the ones similar to a desirable query object C. Faloutsos

  8. $price $price $price 1 1 1 365 365 365 day day day distance function: by expert C. Faloutsos

  9. ‘GEMINI’ - Pictorially eg,. std S1 F(S1) 1 365 day F(Sn) Sn eg, avg 1 365 day C. Faloutsos

  10. Remaining issues • how to extract features automatically? • how to merge similarity scores from different media C. Faloutsos

  11. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • Visualization: Fastmap • Relevance feedback: FALCON • Data Mining / Fractals • Conclusions C. Faloutsos

  12. ~100 ~1 FastMap ?? C. Faloutsos

  13. FastMap • Multi-dimensional scaling (MDS) can do that, but in O(N**2) time • We want a linear algorithm: FastMap [SIGMOD95] C. Faloutsos

  14. Applications: time sequences • given n co-evolving time sequences • visualize them + find rules [ICDE00] DEM rate JPY HKD time C. Faloutsos

  15. Applications - financial • currency exchange rates [ICDE00] FRF GBP JPY HKD USD(t) USD(t-5) C. Faloutsos

  16. FRF DEM HKD JPY USD GBP Applications - financial • currency exchange rates [ICDE00] USD(t) USD(t-5) C. Faloutsos

  17. Application: VideoTrails [ACM MM97] C. Faloutsos

  18. VideoTrails - usage • scene-cut detection (about 10% errors) • scene classification (eg., dialogue vs action) C. Faloutsos

  19. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • Visualization: Fastmap • Relevance feedback: FALCON • Data Mining / Fractals • Conclusions C. Faloutsos

  20. Merging similarity scores • eg., video: text, color, motion, audio • weights change with the query! • solution 1: user specifies weights • solution 2: user gives examples  • and we ‘learn’ what he/she wants: rel. feedback (Rocchio, MARS, MindReader) • but: how about disjunctive queries? C. Faloutsos

  21. ‘FALCON’ Vs Inverted Vs Trader wants only ‘unstable’ stocks C. Faloutsos

  22. “Single query point” methods + + + x + + + Rocchio C. Faloutsos

  23. + + + + + + + + + + + + “Single query point” methods + + + x x x + + + Rocchio MindReader MARS The averaging affect in action... C. Faloutsos

  24. Main idea: FALCON Contours [Wu+, vldb2000] + + feature2 eg., frequency + + + feature1 (eg., temperature) C. Faloutsos

  25. Conclusions for indexing + visualization • GEMINI: fast indexing, exploiting off-the-shelf SAMs • FastMap: automatic feature extraction in O(N) time • FALCON: relevance feedback for disjunctive queries C. Faloutsos

  26. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • New tools for Data Mining: Fractals • Conclusions • Resourses C. Faloutsos

  27. Data mining & fractals – Road map • Motivation – problems / case study • Definition of fractals and power laws • Solutions to posed problems • More examples C. Faloutsos

  28. Problem #1 - spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol) • - ‘spiral’ and ‘elliptical’ galaxies • (stores & households ; mpg & MTBF...) • - patterns? (not Gaussian; not uniform) • attraction/repulsion? • separability?? C. Faloutsos

  29. Problem#2: dim. reduction • given attributes x1, ... xn • possibly, non-linearly correlated • drop the useless ones (Q: why? A: to avoid the ‘dimensionality curse’) C. Faloutsos

  30. Answer: • Fractals / self-similarities / power laws C. Faloutsos

  31. What is a fractal? = self-similar point set, e.g., Sierpinski triangle: zero area; infinite length! ... C. Faloutsos

  32. Definitions (cont’d) • Paradox: Infinite perimeter ; Zero area! • ‘dimensionality’: between 1 and 2 • actually: Log(3)/Log(2) = 1.58… (long story) C. Faloutsos

  33. Q: fractal dimension of a line? Intrinsic (‘fractal’) dimension Eg: #cylinders; miles / gallon C. Faloutsos

  34. Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 (‘power law’: y=x^a) Intrinsic (‘fractal’) dimension C. Faloutsos

  35. Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 (‘power law’: y=x^a) Q: fd of a plane? A: nn ( <= r ) ~ r^2 fd== slope of (log(nn) vs log(r) ) Intrinsic (‘fractal’) dimension C. Faloutsos

  36. log(#pairs within <=r ) 1.58 log( r ) Sierpinsky triangle == ‘correlation integral’ C. Faloutsos

  37. Road map • Motivation – problems / case studies • Definition of fractals and power laws • Solutions to posed problems • More examples • Conclusions C. Faloutsos

  38. Solution#1: spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol - ‘BOPS’ plot - [sigmod2000]) • clusters? • separable? • attraction/repulsion? • data ‘scrubbing’ – duplicates? C. Faloutsos

  39. Solution#1: spatial d.m. log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) C. Faloutsos

  40. Solution#1: spatial d.m. [w/ Seeger, Traina, Traina, SIGMOD00] log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) C. Faloutsos

  41. r1 r2 r2 r1 spatial d.m. Heuristic on choosing # of clusters C. Faloutsos

  42. Solution#1: spatial d.m. log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) C. Faloutsos

  43. Solution#1: spatial d.m. log(#pairs within <=r ) • - 1.8 slope • - plateau! • repulsion!! ell-ell spi-spi -duplicates spi-ell log(r) C. Faloutsos

  44. Problem #2: Dim. reduction C. Faloutsos

  45. Solution: • drop the attributes that don’t increase the ‘partial f.d.’ PFD • dfn: PFD of attribute set A is the f.d. of the projected cloud of points [w/ Traina, Traina, Wu, SBBD00] C. Faloutsos

  46. Problem #2: dim. reduction global FD=1 PFD=1 PFD~1 PFD=0 PFD=1 PFD~1 C. Faloutsos

  47. Problem #2: dim. reduction global FD=1 PFD=1 PFD=1 Notice: ‘max variance’ would fail here PFD=0 PFD=1 PFD~1 C. Faloutsos

  48. Problem #2: dim. reduction global FD=1 PFD=1 PFD~1 Notice: SVD would fail here PFD=0 PFD=1 PFD~1 C. Faloutsos

  49. Road map • Motivation – problems / case studies • Definition of fractals and power laws • Solutions to posed problems • More examples • fractals • power laws • Conclusions C. Faloutsos

  50. #bytes time disk traffic • Not Poisson, not(?) iid - BUT: self-similar • How to model it? C. Faloutsos

More Related