1 / 26

Peer-to-Peer Filesystems

Peer-to-Peer Filesystems. Tom Roeder CS414 2005sp. Nature of P2P Systems. We discussed this a little in 415 on Friday P2P: communicating peers in the system normally an overlay in the network In some sense, P2P is older than the name many protocols used symmetric interactions

loe
Download Presentation

Peer-to-Peer Filesystems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peer-to-Peer Filesystems Tom Roeder CS414 2005sp

  2. Nature of P2P Systems • We discussed this a little in 415 on Friday • P2P: communicating peers in the system • normally an overlay in the network • In some sense, P2P is older than the name • many protocols used symmetric interactions • not everything is client-server • What’s the real definition? • no-one has a good one, yet • depends on what you want to fit in the class

  3. Nature of P2P Systems • Standard definition • symmetric interactions between peers • no distinguished server • Minimally: is the Web a P2P system? • We don’t want to say that it is • but it is, under this definition • I can always run a server if I want: no asymmtery • There must be more structure than this • Let’s try again

  4. Nature of P2P Systems • Recent definition • No distinguished initial state • Each server has the same code • servers cooperate to handle requests • clients don’t matter: servers are the P2P system • Try again: is the Web P2P? • No, not under this def: servers don’t interact • Is the Google server farm P2P? • Depends on how it’s set up? Probably not.

  5. Overlays • Recall: two types of overlays • Unstructured • No infrastructure set up for routing • Random walks, flood search • Structured • Small World Phenomenon: Kleinberg • Set up enough structure to get fast routing • We will see O(log n) • For special tasks, can get O(1)

  6. Overlays: Unstructured • From Gribble • a common unstructured overlay • look at connectivity • more structure than it seems at first

  7. Overlays: Unstructured • Gossip: state synchronization technique • Instead of forced flooding, share state • Do so infrequently with one neighbor at a time • Original insight from epidemic theory • Convergence of state is reasonably fast • with high probability for almost all nodes • good probabilistic guarantees • Trivial to implement • Saves bandwidth and energy consumption

  8. Overlays: Structured • Need to build up long distance pointers • think of routing within levels of a namespace • eg. namespace is 10 digit numbers base 4 • 0112032101 • then you can hop levels to find other nodes • This is the most common structure imposed

  9. Distributed Hash Tables • One way to do this structured routing • Assign each node each node an id from space • eg. 128 bits: SHA-1 salted hash of IP address • build up a ring: circular hashing • assign nodes into this space • Value • diversity of neighbors • even coverage of space • less chance of attack?

  10. Distributed Hash Tables • Why “hash tables”? • Stored named objects by hash code • Route the object to the nearest location in space • key idea: nodes and objects share id space • How do you find an object without its name? • Close names don’t help because of hashing • Cost of churn? • In most P2P apps, many joins and leaves • Cost of freeloaders?

  11. Distributed Hash Tables • Dangers • Sybil attacks: one node becomes many • id attacks: can place your node wherever • Solutions hard to come by • crytpo puzzles / money for IDs? • Certification of routing and storage? • Many routing frameworks in this spirit • Very popular in late 90s early 00s • Pastry, Tapestry, CAN, Chord, Kademlia

  12. Applications of DHTs • Almost anything that involves routing • illegal file sharing: obvious application • backup/storage • filesystems • P2P DNS • Good properties • O(log N) hops to find an id • Non-fate-sharing id neighbors • Random distribution of objects to nodes

  13. Pastry: Node state

  14. Pastry: Node Joins • Find another geographically nearby node • Hash IP address to get Pastry id • Try to route a join message to this id • get routing tables from each hop and dest • select neighborhood set from nearby node • get the leaf set from the destination • Give info back to nodes so they can add you • Assuming the Pastry ring is well set up, this procedure will give good parameters

  15. Pastry: Node Joins • Consider what happens from node 0 • bootstraps itself • next node to come adds itself and adds this node • Neighborhood information will be bad for a while • need a good way to discover network proximity • This is a current research problem • On node leaves, do the reverse • If a node leaves suddenly, must be detected • removal from tables by detecting node

  16. Pastry: Routing • The key idea: grow common prefix • given an object id, try to send to a node with at least one more digit in common • if not possible, send to a node that is closer numerically • if not possible, then you are the destination • Gives O(log N) hops • Each step gets closer to destination • Guaranteed to converge

  17. Pastry: Routing

  18. PAST: Pastry Filesystem • Now a simple filesystem follows: • to get a file, hash its name and look up in Pastry • to store a file, store it Pastry • Punt on metadata/discovery • Can implement directories as files • Then just need to know the name of root • Shown to give reasonable utilization of storage space

  19. PAST: File Replication • Since any one node might fail, replicate • Uses the neighbor set for k-way storage • Keeps the same file at each neighbor • Diversity of neighbors helps fate-sharing • Certification • Each node signs a certificate • Says that it stored the file • Client will retry storage if not enough certificates • OK guarantees

  20. PAST: Tradeoffs • No explicit FS structure: • Could build any sort of system by storing files • Basically variable-sized block storage mechanism • This buys simplicity at the cost of optimization • Speed vs. storage • See Beehive for this tradeoff • Makes it an explicit formula; can be tuned • Ease of use vs. security • Hashes make file discovery non-transparent

  21. Rationale and Validation • Backing up on other systems • no fate sharing • automatic backup by storing the file • But • Cost much higher than regular filesystem • Incentives: why should I store your files? • How is this better than tape backup? • How is this affected by churn/freeloaders • Will anyone ever use it?

  22. PAST: comparsion to CFS • CFS: a filesystem built on Chord/DHash • Pastry is MSR, Chord/DHash is MIT • Very similar routing and storage

  23. PAST: comparison to CFS • PAST stores files, CFS blocks • Thus CFS can use more fine-grained space • lookup could be much longer • get each block: must go through routing for each • CFS claims: ftp-like speed • Could imagine much faster: get blocks in parallel • thus routing is slowing them down • Remember: hops here are overlay, not internet, hops • Load balancing in CFS • predictable storage requirements per file per node

  24. References • A. Rowstron and P. Druschel, "Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems". IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pages 329-350, November, 2001. • A. Rowstron and P. Druschel, "Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility", ACM Symposium on Operating Systems Principles (SOSP'01), Banff, Canada, October 2001. • Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, ACM SIGCOMM 2001, San Deigo, CA, August 2001, pp. 149-160.

  25. References • Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica, Wide-area cooperative storage with CFS, ACM SOSP 2001, Banff, October 2001. • Stefan Saroiu, P. Krishna Gummadi, and Steven D. Gribble. A Measurement Study of Peer-to-Peer File Sharing Systems, Proceedings of Multimedia Computing and Networking 2002 (MMCN'02), San Jose, CA, January 2002.Kleinberg • C. G. Plaxton, R. Rajaraman, and A. W. Richa. Accessing nearby copies of replicated objects in a distributed environment. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, Newport, Rhode Island, pages 311-320, June 1997.

  26. Conclusions • Tradeoffs are critical • Why are you using it? • What sort of security/anonymity guarantees? • DHT applications • Think of a good one and become famous • PAST • caches whole files • Save some routing overhead • Harder to implement true filesystem

More Related