CS522: Algorithmic and Economic Aspects of the Internet

CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica (nickle@microsoft.com) Mohammad Mahdian (mahdian@microsoft.com)

Peer-To-Peer Networks • A network in which nodes employ distributed resources to accomplish critical task • Nodes are typically equals, i.e. (approximately) indistinguishable in functionality • System is highly dynamic, nodes frequently come and go

P2P Definition Distributed systems consisting of interconnected nodes able to self-organize into network topologies with the purpose of sharing resources such as content, CPU cycles, storage and bandwidth, capable of adapting to failures and accommodating transient populations of nodes while maintaining acceptable connectivity and performance, without requiring the intermediation or support of a global centralized server or authority. – A Survey of Peer-To-Peer Content Distribution Technologies, Androutsellis-Theotokis and Spinellis

Peer-To-Peer Applications • Direct real-time communication: instant messaging • Combine processing power of multiple distributed machines to perform complex computations: analysis of SETI data, prime computation • Store and distribute digital content: mp3 file sharing

Peer-To-Peer Benefits • Self-organized and adaptive • Easily scalable • Fault-tolerant and load balanced • Resistant to censorship

P2P Construction Clients Servers SPRINT P2P overlay network AOL MIT UUNET

P2P Classification Data organization Centralization

Napster, IM 2 3 Reply: 6 Query: U2 • Centralized servers maintain list of files and peer at which file is stored • Peers join, leave, and query network via direct communication with servers • File transfers occur directly between peers 1 Server 4 7 File Transfer 5 6

Napster, IM • Advantages: • Highly efficient data lookup • Rapidly adapts to changes in network • Disadvantages: • Questionable scalability • Vulnerable to censorship, failure, attack

Gnutella • All peers, called servents, are identical and function as both servers and clients • A peer joins network by contacting existing servents (chosen from online databases) using PING messages • A servent receiving a PING message replies with a PONG message and forwards PING to other servents • Peer connects to servents who send PONG

Gnutella • A servent queries network by sending a QUERY message • A servent receiving a QUERY message replies with a QUERYHIT message if he can answer the query. If not, he forwards QUERY message to other servents

Routing in Gnutella • How PING/QUERY messages are forwarded affects network topology, search efficiency/accuracy, and scalability • Proposals • Breadth-First-Search: flooding, iterative deepening, modified random BFS • Depth-First-Search: random walk, k-walker random walks, two-level random walk, dominating set based search

Hybrid Search • Random walk with lookahead: short random walks with shallow local flooding • Takes advantage of “supernodes” or nodes of high degree • Stationary dist. of random walk is naturally biased towards supernodes • Lookahead allows search to quickly discover content stored at all neighbors of these high-degree nodes

Supernodes • Improve scalability and performance of Gnutella-like systems via supernodes • Supernodes are special peers with high degree, elected dynamically according to bandwidth and other considerations • Supernodes maintain a list of content stored at peers • Advantages: • Searches propagate on supernodes, 3 to 5 times faster • Takes advantage of heterogeneity in network

Gnutella • Advantages • Entirely decentralized, pure P2P network • Highly resistant to failure • Disadvantages • Search is time-consuming • Network typically scales poorly

Chord • Distributed hash table (DHT) implementation • Each node/piece of content has an ID • Content IDs are deterministically mapped to node IDs so a searcher knows exactly where data is located, a content addressable network • Efficient: O(log n) messages per lookup • Scalable: O(log n) state per node

Keys in Chord • m bit identifier space for both nodes and content keys • Content ID = hash(content) • Node ID = hash(IP address) • Both are uniformly distributed • How to map content IDs to node IDs?

Mapping Content to Nodes 0 K5 IP=“198.10.10.1” N123 K20 Circular 7-bit ID space K101 N32 Content = “U2” N90 K60 Figure adapted from Stoica et al. Content is stored at successor node, node with next higher ID

Routing • Every node knows of every other node • Routing tables O(n), lookup O(1) N10 Where is “U2”? Hash(“U2”) = K60 N123 N32 “N90 has K60” K60 N90 N55 Figure adapted from Stoica et al.

Routing • Every node knows its successor in ring • Routing tables O(1), lookup O(n) N10 Where is “U2”? Hash(“U2”) = K60 N123 N32 “N90 has K60” K60 N90 N55 Figure adapted from Stoica et al.

Routing • Every node knows m others • Distances increase exponentially, node i points to node whose ID is successor of i + 2j for j from 1 to m. These pointers are called fingers. • The finger (routing) table and search time are both O(log n)

Finger Tables N16 N112 80 + 25 80 + 26 N96 80 + 24 80 + 23 80 + 22 80 + 21 80 + 20 N80 Figure adapted from Stoica et al.

Routing with Finger Tables N5 N10 N110 K19 N20 N99 N32 Lookup(K19) N80 N60 Figure adapted from Stoica et al.

Chord Dynamics • When a node joins • Initialize all fingers of new node • Update fingers of existing nodes • Transfer content from successor to new node • When a node leaves • Transfer content to successor

Chord Failures • Churn rate is very high (on average, nodes are in system for only 60 minutes) and events happen concurrently • Churn (esp. ungraceful departures or simultaneous joins/departures) can failure states, e.g. inconsistencies in successor relationships or, worse, loopy states • Requires a lot of maintenance messages to preserve ideal state

Maintenance in P2P • Maintenance protocol ensures global connectivity and efficient lookup by continuously repairing overlay network and routing tables • Maintenance is essential, e.g. when a node • Joins and announces presence • Updates routing table to ensure efficient search • Monitors neighbors for failures/departures • Cost of maintenance protocol can be measured in terms of the rate of maintenance messages

Half Life • Defn. Suppose there are Nt nodes at time t. Let the doubling timet be such that at time t + t, Nt new nodes have arrived. Similarly let halving timet be such that at time t + t, Nt/2 nodes have departed. Then the half life of a system is mint(t, t). • The half life is the average amount of time until half the system has been replaced. • Measures rate of change of system.

Example • Nodes arrive according to Poisson with rate : prob. k arrivals in time t proportional to e-t • Nodes remain for duration exponential rate : prob. node stays for  amount of time is e- • If system in steady state, then arrival rate  must equal departure rate  N, so N = / . • Doubling time t = N/ = 1/, halving time t = (ln 2)/, and so half life = (ln 2)/.

Bounding Maintenance Costs • Thm. There exists a sequence of joins and leaves such that any node that, at any time, has received an average of fewer than k notifications per half-life will be disconnected from the network with prob. at least (1 – 1/e)k. • Cor. Any N-node P2P network that remains connected with prob. at least 1 – 1/N must generate an average of (log N) notifications per node per half life.

Proof of Bound • Consider Poisson arrival rate , exponential waiting time =1 in system. • Suppose node n averages fewer than k notifications per half life and so there is a minimum time t such that at time t, n has received less than kt notifications. • Observe n is isolated at time t with probability at least (1 – 1/(e-1))k.

Maintenance in Chord • Liben-Nowell, Balakrishnan, Karger: (Modified) Chord requires only O(log2 n) maintenance messages per half life to maintain efficiency and correctness of search.

Chord • Advantages: • Highly efficient search • Good load balancing • Disadvantages: • Locality of data is destroyed • Only handles exact match queries, but keyword queries are more prevalent • Most requests are for highly replicated files (needles vs haystack)

Conclusion • Saw several representative P2P systems, each with advantages and disadvantages • Many important issues • Efficiency of search • Ability to adapt to dynamics of system • Security: Malicious peers, Spread of worms • Free riding: Reputation mechanisms, Micro-payment mechanisms • Legality

CS522: Algorithmic and Economic Aspects of the Internet

CS522: Algorithmic and Economic Aspects of the Internet

Presentation Transcript

The Internet and World Wide Web

The Internet

The Internet, Intranets, and Extranets

Internet of Things

Algorithmic Game Theory and Internet Computing

Chapter 2: Economic Organization and Efficiency

Internet 应用

The Internet

Internet Safety

Algorithmic Verification of Concurrent Programs

Non-Hierarchical Sequencing Graphs

Algorithmic Game Theory and Internet Computing