1 / 40

Peer-To-Peer Data Management

Peer-To-Peer Data Management. Hector Garcia-Molina ICDE Conference, February 28, 2002. What is P2P?. pastry. jxta. can. fiorana. napster. freenet. united devices. open cola. ?. aim. ocean store. netmeeting. farsite. gnutella. icq. ebay. maorpheus. limewire. seti@home.

lainey
Download Presentation

Peer-To-Peer Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peer-To-Peer Data Management Hector Garcia-Molina ICDE Conference, February 28, 2002

  2. What is P2P? pastry jxta can fiorana napster freenet united devices open cola ? aim ocean store netmeeting farsite gnutella icq ebay maorpheus limewire seti@home bearshare uddi grove jabber popular power kazaa folding@home tapestry mojo nation process tree chord

  3. join get query file answer Napster central index ...

  4. query Gnutella

  5. ... ... ... ... ... ... Morpheus super peer

  6. Seti@Home satellite dish raw data chunk analyzed data central site ...

  7. Lockss D3 D1 library D library A D2 library C library B library E

  8. before: after: Stanford source PeerCast Stanford source

  9. What is a P2P System? • Multiple sites (at edge) • Distributed resources • Sites are autonomous (different owners) • Sites are both clients and servers • Sites have equal functionality P2P Purity

  10. P2P is BAD IDEA!! • Distribution is expensive! • Specialized functionality is good!

  11. Example: Distributed Data Management • Distribution is expensive • If you must distribute: • build centralized directory, index • use backups for reliability • for replicated data, use primary copy • use backups for reliability

  12. Computational Efficiency is NOT Main Goal • Main driving force in a P2P system: • exploiting existing (often free) resources • sharing costs among many • legal protection • autonomy • anonymity

  13. Should We Do P2P Research? • Should we help people break the law? • Analogy: Should we develop pillows, knives, hammers, drugs, bath tubs, cars, airplanes, ... ??

  14. Should We Do P2P Research? • YES: P2P not exclusively for breaking law • Remember the VCR • YES: P2P can liberate us from culture “plantation owners” (Lessig)

  15. today economic activity rules of the game Is “Free Culture’’ Feasible? • Example: Legal texts • Can we afford it?

  16. Should DB community work on P2P? • YES

  17. P2P Challenges • Easier to list NON-Research-Topics: • Color schemes for P2P Nodes • Impact of P2P on Moroccan 15th Century Literature

  18. P2P Challenges • Search • Resource Management • Security & Privacy

  19. Search Taxonomy lookup freenet can partial replicated SP gnutella content queries search morpheus napster routing global single site regional scope of index

  20. Index Implementation Taxonomy routing replicated SP freenet yes gnutella morpheus index location correlated with content location partial napster no can P2P centralized distributed nature of index

  21. Content Addressable Network (CAN) Nodes 1 Data 2

  22. Can We Improve Flooding? routing replicated SP freenet yes gnutella morpheus index location correlated with content location partial napster no can P2P centralized distributed nature of index

  23. Directed BFS in Gnutella • Heuristics for Selecting Direction >RES: Returned most results <TIME: Shortest satisfaction time <HOPS: Min hops for results >MSG: Sent us most messages (all types) <QLEN: Shortest queue <LAT: Shortest latency >DEG: Highest degree ? ... query

  24. How Does One Evaluate? • Live Gnutella? • Use real Gnutella as “laboratory”

  25. Time to Satisfaction for Directed BFS

  26. DB AI A 0 20 Q(DB) DB AI C 25 50 B 65 20 DB AI B 50 0 A 0 20 B 90 50 DB AI C 25 50 D 15 0 D 15 0 B 75 70 Routing Index C A B D

  27. Types of Routing Indexes • Compound • Hop Count • Exponential Decay • Strategies for Cycles • Ignore (for Hop-Count, exponential) • Avoid Update Cycles • Detect Update Cycles and Recover

  28. Effect of Index Compression

  29. Effect of Network Topology

  30. Resource Management • Resource: • storage (lockss) • CPU processing (seti@home) • bandwidth (PeerCast) • Issues: • fairness • load balancing

  31. trade trade A1 A2 B1 B2 Example: Data Trading site 1 site 2 site 3 A1 C1 B1 A2 B2 C2

  32. trade trade A1 B2 C2 B1 A2 trade C1 Example: Data Trading site 1 site 2 site 3 A1 C1 B1 A2 B2 C2

  33. Data Trading • Order of trades impacts reliability • Issues: • Swaps vs. Deeds • Fixed price vs. bids • Preference to • sites with a lot of space? • reliable sites? • “desperate” sites?

  34. Effect of Bid Policies bid more (ask more in return) when I have less free space bid more (ask more in return) when I have more free space

  35. Effect of One Maverick Site always bids high

  36. Security & Privacy • Issues: • Anonymity • Reputation • Accountability • Information Preservation • Information Quality • Trust • Denial of service attacks

  37. Information Preservation • Example Policy: make 3 copies of documents A1 make copies What can go wrong?

  38. What Can Go Wrong? • “Bad” sites make copies • “Bad” site alters copy • “Bad” site publishes fake • “Bad” site makes may copies of other docs • ... A1 A1 make copies A’1

  39. Conclusion • P2P systems popular today • P2P systems vulnerable and inefficient • Many challenges ahead • Search • Resource Management • Security and Privacy

  40. For Additional Information • Google: “Stanford Peers” • http://www-db.stanford.edu/peers/

More Related