1 / 29

Collaborative Content Delivery

.: DRAFT :. Collaborative Content Delivery. A peer-to-peer solution for web-based publish/subscribe. Werner Vogels Robbert van Renesse, Ken Birman Dept. of Computer Science, Cornell University. Presentation duality …. The case for Collaborative Content Delivery vs

ambern
Download Presentation

Collaborative Content Delivery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. .: DRAFT :. Collaborative Content Delivery A peer-to-peer solution for web-based publish/subscribe Werner VogelsRobbert van Renesse, Ken BirmanDept. of Computer Science, Cornell University

  2. Presentation duality … • The case for Collaborative Content Delivery vs • The innovative technology used to build the system • Spectacularly scalable technology • Secure, reliable, robust & fast • A solution to many distributed management problems

  3. Late night reading Epidemic Theory of Infectious Diseases and its ApplicationsN.T.J. BaileyHafner PressSecond Edition, 1975

  4. The Problem • Access to real-time information at syndicated news sites is highly inefficient • An estimated 70%-80% of the bandwidth is wasted on redundant transport both at the consumer and at the publisher • Consumers frequently return to the website to receive timely updates

  5. Isn’t this solved already? • RSS – channels provide summaries for processing by bots. • But the mechanism remains “pull” • HTTP – Delta should reduce bw cost • News feeds from major vendors • “push” is the right model for frequently changing data with timely delivery • Proprietary formats and high fees • Email summary as cheap alternative • Still high bandwidth cost at the publisher • Hybrid “push/pull” by organizations exploiting distributed content delivery

  6. Scale is a major obstacle • No coordinated action by syndication sites to provide shared information push infrastructure • The one-to-many technologies used currently are inherently not scalable • No technology is available that can deliver data from thousands publishers to millions of subscribers in real-time.

  7. We can do better • Current push solutions fail to exploit the collaborative power of the Internet • Ideally the publishers inject one update into the world and all interested subscribers will receive this. • In this model all consumers are collaborating to route the information to right subscribers • The information arrives at all desktops within tens of seconds after publishing

  8. Peer-to-Peer Solution • P2P is the only approach to a cost effective, scalable solution • Subscribers weave an ad-hoc infrastructure for subscription based routing • Scalable, autonomous & decentralized management • High level of robustness and reliability in message delivery • Authentication of publishers

  9. Emerging technologies • Astrolabe, CAN, Cord, Pastry, are emerging research technologies. • Astrolabe the furthest in • Scalability • Security integration • Manageable • Firewall, proxy and NAT support • Complete technology that we are now using to develop applications

  10. Astrolabe/Mariner • A system for ultra-scalable, distributed state management • Robust, through the use of epidemic techniques • Scalable, through the use of information aggregation and fusion • Secure, through certificates • Flexible, through secure mobile code • Simulated, Emulated, Tested and Deployed.

  11. Astrolabe Robust and Scalable Technology for Distributed System Monitoring, Management and Data Mining

  12. Distributed Systems Management • Is extremely important in the deployment of large systems • Scalable management of applications and systems is still a major Quest • Management technology needs to be integrated into applications • The management subsystem is often more complex than the application itself

  13. Astrolabe • Information/state management system • Monitors the dynamically changing state of sets of distributed resources • Reports summaries to its consumers • Uses information hierarchies to organize the data • Uses aggregation techniques to continuously compute the summary nodes in the system

  14. Current use of Mariner • Monitor and control applications, systems and infrastructure • Resource discovery • Collaboration management • Coordination of distributed tasks • Edge-caching control • CDN dynamic management

  15. Intuitively • You can see mariner as a large database with information about the global system • None of this information resides on a single server • Each principal has a row in the virtual database in which it is allowed to update with <attribute, value> pairs. • A principal can only directly access the rows of other nodes in its zone and its intermediate nodes in the hierarchy to the root.

  16. Mariner in a single zone • Lowest level in the hierarchies can be nodes or finer grained if the application requires it • Security key for zone needed to add a new column; user key needed to update row

  17. Scalability through Hierarchy • Leafs are organized into zones • Each leaf has a self-managed attribute list • The base zone is the collection of individual attribute lists of its leafs • Each intermediate zone is the collection of attribute list constructed out of aggregation of the information in its child zones • Each list has some basic attributes, that Mariner uses to manage itself such contact lists, timestamps, etc.

  18. Simple Hierarchy New Jersey San Francisco

  19. Information Aggregation • Aggregation functions are programmable • Subset of SQL • Code is embedded in aggregation function certificates (AFC) • Signed certificate is installed into an attribute list • Used to construct (new) attributes in zones of the hierarchy

  20. Epidemic Dissemination • Each Astrolabe instance maintains all the zones on its path to the root • No centralized servers for intermediate zones • Consequently each instance has a copy of the root zone • Replication is achieved through gossip techniques. • Guarantees eventual consistency

  21. AFC propagation • Output of the AFC includes a copy of it self – results in a copy of the AFC into the parent zone • Reaches the root and other zone leafs • Adoption – check the ancestors lists to find new AFC’s • Spreads through the system in the order of tens of seconds. • Certificates have an expiration date, unless refreshed aggregation eventually halts

  22. I’ll skip • Aggregation function details • Mobile code details • Eventual consitency • Certificates • Authentication • Firewalls, & nat’s

  23. Robustness through Gossip • Use of Epidemic Techniques to disseminate data and AFC’s • Pure peer-to-peer communication • Full autonomous progress • Actions based on probability theory • Robustness improves with scale • Fixed low overhead, independent of scale • Control as well as Data transport

  24. Gossip • Conceptually: each zone periodically picks another zone at random and exchanges the state of those zones • Slightly more complex because there are virtual zones …

  25. Gossip target selection • Each instance update the issued attribute, evaluates depending AFC’s • An agent (instance) will gossip on behalf of those zones for which it is a contact, with a rate depending on configuration • At each level pick at random a child from the contact list and exchange state

  26. Membership • Failure detection • If no update seen for an agent in time Tfail, remove it from the system • Integration • After partitions, crashes, etc. renegate trees can be formed • Use of broadcast, multicast, hints, to discover other agents

  27. Subscription routing • At the leafs the subscribers store subscription information • Aggregation functions combine the subscriptions of participants into subscriptions for the zone • Publishers use zone.send(subscription, data) which is forwarded if the zone has children that match the subscription

  28. Routing infrastructure • Each zone dynamically selects 2-3 routing nodes using AFC’s using various load factors • These nodes receive news items for their children in their zone • Forwarding based on the individual subscription information • Redundancy used to achieve robustness and reliability

  29. Summary

More Related