1 / 15

Scalable and Distributed Similarity Search in Metric Spaces

Scalable and Distributed Similarity Search in Metric Spaces. Michal Batko Claudio Gennaro Pavel Zezula. Presentation contents. Motivation Metric spaces and similarity searching GHT* Concepts Generalized Hyperplane Tree Distributed architecture Experimental results

kennan
Download Presentation

Scalable and Distributed Similarity Search in Metric Spaces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable and Distributed Similarity Searchin Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula

  2. Presentation contents • Motivation • Metric spaces and similarity searching • GHT* • Concepts • Generalized Hyperplane Tree • Distributed architecture • Experimental results • Conclusions and future work

  3. Motivation • Searching is a fundamental problem • Traditional search • Numbers or strings • Based on total linear order of keys • New approach • Free text, images, audio, video, etc. • Impossible to structure in keys and records

  4. Alternative Similarity searching Metric spaces

  5. Metric space • Set of objects (A) • any class of objects, which allows distance computing • for example text, audio or video files • Metric function (d) • positive • reflexive • symmetric • triangle inequality

  6. r Q 1 Q 3 2 4 Similarity searching • Range search • objects at max distance rfrom object Q • k-nearest neighbor search • k nearest neighbor objects of object Q

  7. GHT* – concepts • Data distributed among servers • Multiple buckets with limited capacity • Clients perform updates and search • Bucket location algorithm • Based on DDH and DST algorithms • Exploits Generalized Hyperplane Tree

  8. p3 p12 p2 p2 p4 p10 p11 p6 p7 p9 p2 p5 p13 p1 p5 p5 p2 p4 p6 p12 p10 p9 p8 p5 p3 p7 p11 p13 p14 p1 p8 Generalized Hyperplane Tree • Single-site metric space indexing structure • Allows similarity searching and is scalable • Binary search tree • Data stored in leaf nodes • Inner nodes for routing • Two “pivots” per node P14

  9. GHT* – distributed architecture • GHT is used as search structure • Leaf node represents a server • unique server identifier • servers extend the tree with leaf nodes for their local buckets • Inner nodes store routing information • GHT is replicated • GHT can be inaccurate • Update (image adjustment) messages

  10. GHT* – distributed architecture

  11. Experimental results – inserting • Preliminary phase • Tests for vector space with Euclidean distance function

  12. Experimental results – searching 20 range queries with radius 50 points (match approx. 3 objects)

  13. Conclusions • First structure for scalable distributed similarity search • Satisfies properties of SDDS • Scalability – can expand to new servers through autonomous splits • No hot-spot – all clients use as precise addressing as possible and learn from misaddressing • Updates are local and never require updates to multiple clients • Client performs only a few distance computations to locate servers

  14. Future work • More experiments • Different metric spaces • More complex evaluation • Additional evaluated properties • Nearest neighbor search • Algorithm for parallel processing to better utilize distributed structure • Experimental evaluation

  15. Questions?

More Related