1 / 41

P2P Search

P2P Search. COP6731 Advanced Database Systems. P2P Computing. Powerful personal computer Share computing resources P2P Computing Advantages: Shared infrastructure costs Highly scalable No SPOF censorship-resistance. P2P Search Techniques. Centralized P2P systems

Download Presentation

P2P Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. P2P Search COP6731 Advanced Database Systems

  2. P2P Computing • Powerful personal computer Share computing resources P2P Computing • Advantages: • Shared infrastructure costs • Highly scalable • No SPOF • censorship-resistance

  3. P2P Search Techniques • Centralized P2P systems • e.g. Napster, SETI@home • Decentralized & unstructured P2P systems • e.g. Gnutella • Hybrid - partially decentralized • e.g., Freenet • Structured P2P systems • DHT systems (CAN/Chord/Pastry/Tapestry) • Skip-list based systems

  4. Napster • MP3 file sharing with a centralized catalog • Peers hold files • Napster Inc’s servers hold catalog • File transfer is P2P, using a proprietary protocol

  5. Napster: Publish a File Users upload their IP address and music titles they wish to share Central Napster server (xyz.mp3, 192.1.2.3) 192.1.2.3

  6. Napster: Query for a File • Users search for peers to download desired files xyz.mp3 ? 192.1.2.3 192.1.2.3 Central Napster server

  7. Napster: Transfer Requested File File transfer is P2P, using a proprietary protocol xyz.mp3 ? 192.1.2.3 Central Napster server

  8. Disadvantage of Centralized Directory • Performance bottleneck • Single point of failure Can we do it without a directory ?

  9. Gnutella • No catalog • Pings network to locate Gnutella peers • File requests are broadcast to peers • Flooding or breadth-first research • When provider is located, the file is transferred via HTTP

  10. Gnutella: Issue a Request xyz.mp3 ?

  11. Gnutella: Flood the Request

  12. Gnutella: Reply with the File xyz.mp3

  13. Gnutella - Disadvantages • Network flooding - unnecessary network traffic • Using TTL - some files might not be found • Alternatively, • using ultranodes (or supernodes) • using depth-first search, i.e., Freenet

  14. Morpheus, Kazaa Supernode Layer

  15. Using Ultranodes • Queries flood only the network of ultranodes • Other peer nodes shielded from query traffic • Combine the benefits of centralized and decentralized search; • Take advantage of the heterogeneity in peer capabilities;

  16. Freenet - Depth-First Search

  17. Freenet – File not Found • The requested file not found due to a poor routing decision made at peer D • In this case, query backs out of the dead-end, and tries another peer in depth-first manner

  18. Structured P2P Systems • DHT-based • Chord / Pastry / Tapestry: hash-based into single dimensional space • CAN: hash-based into multi-dimensional space • P-grid: hash-based into virtual binary search tree • Skip-list based • Skipgraph / SkipNet • Index Tree-based • BATON

  19. DHT Design Goals • An “overlay” network with: • Flexible mapping of keys to physical nodes • Data Independence • Small network diameter • Small degree (fan-out) • Local routing decisions • Robustness to churn • Routing flexibility • Proximity • A “storage” or “memory” mechanism with • No guarantees on persistence • Maintenance via soft state

  20. Metrics • Searching/Lookup • Number of hops in searching • Number of messages • Database related metrics: • Total disk I/O • Response Time • Accuracy • Maintenance • Number of hops • Number of messages

  21. How to Bound Search Space ? Work on placement! Network

  22. Basic Idea - Hashing P2P Network Publish (H(y)) Join (H(x)) Object “y” Peer “x” H(y) H(x) Peer nodes also have hash keys in the same hash space Objects have hash keys y x Hash key Place object to the peer with closest hash keys

  23. Internet Viewed as a Distributed Hash Table 0 2128-1 Hash table Peer nodes Each is responsible for a range of the hash table, according to the peer hash key Objects are placed in the peer with the closest key Note that peers are Internet edges

  24. How to Find an Object? 0 2128-1 Hash table Peer node Want to keep only a few entries! one hop to find the object Simplest idea: Everyone knows everyone else!

  25. Using Distributed Hash Table (DHT) • A peer only needs to know its logical neighbors • Search based on multihop routing 0 2128-1 Hash table Peer node

  26. K V K V K V K V K V K V K V K V K V K V K V DHT in action

  27. K V K V K V K V K V K V K V K V K V K V K V DHT in action

  28. K V K V K V K V K V K V K V K V K V K V K V DHT in action Operation: take key as input; route messages to node holding key

  29. K V K V K V K V K V K V K V K V K V K V K V DHT in action: put() insert(K1,V1) Operation: take key as input; route messages to node holding key

  30. K V K V K V K V K V K V K V K V K V K V K V DHT in action: put() insert(K1,V1) Operation: take key as input; route messages to node holding key

  31. K V K V K V K V K V K V K V K V K V K V K V DHT in action: put() (K1,V1) Operation: take key as input; route messages to node holding key

  32. K V K V K V K V K V K V K V K V K V K V K V DHT in action: get() retrieve (K1) Operation: take key as input; route messages to node holding key

  33. K V K V K V K V K V K V K V K V K V K V K V DHT in action retrieve (K1)

  34. CAN – Content Addressable Network • Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone • Each peer knows the neighbors of its zone • Random assignment of peers to zones at startup • Dimensional-ordered multihop routing

  35. CAN: Object Publishing I node I::publish(K,V)

  36. CAN: Object Publishing x = a I node I::publish(K,V) (1) a = hx(K)

  37. CAN: Object Publishing x = a I node I::publish(K,V) (1) a = hx(K) b = hy(K) y = b

  38. CAN: Object Publishing I node I::publish(K,V) J (1) a = hx(K) b = hy(K) (2) route (K,V) -> J

  39. CAN: Object Publishing I node I::publish(K,V) J (1) a = hx(K) b = hy(K) (K,V) (2) route (K,V) -> J (3) J stores (K,V)

  40. CAN: Object Retrieval node I::retrieve(K) J (1) a = hx(K) b = hy(K) (K,V) (2) route “retrieve(K)” to J that is in charge of (a,b) I

  41. Some Research Topics • Content-based Image Retrieval in P2P • Location Management in P2P • Security Considerations for DHT • P2P Backup • Wireless P2P

More Related