1 / 93

Overlay Neighborhoods for Distributed Publish/Subscribe Systems

Overlay Neighborhoods for Distributed Publish/Subscribe Systems. Reza Sherafat Kazemzadeh Supervisor: Dr. Hans-Arno Jacobsen SGS PhD Thesis Defense University of Toronto September 5, 2012. Content-Based Pub/Sub. P. P. P. P. Publish. P. P. Pub/Sub. S. S. S. S. S. P. S. S. S.

keelty
Download Presentation

Overlay Neighborhoods for Distributed Publish/Subscribe Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overlay Neighborhoods for Distributed Publish/Subscribe Systems Reza Sherafat Kazemzadeh Supervisor: Dr. Hans-Arno Jacobsen SGS PhD Thesis Defense University of Toronto September 5, 2012

  2. Content-Based Pub/Sub P P P P Publish P P Pub/Sub S S S S S P S S S S Subscribers Publishers

  3. Thesis Contributions List of publications: [ACM Surveys]Dependable publish/subscribe systems (being submitted) [Middleware’12]Opportunistic Multi-Path Publication Forwarding in Pub/Sub Overlays [ICDCS’12]Publiy+: A Peer-Assisted Pub/Sub Service for Timely Dissemination of Bulk Content [SRDS’11]Partition-Tolerant Distributed Publish/Subscribe Systems [SRDS’09]Reliable and Highly Available Distributed Publish/Subscribe Service [ACM Transactions on Parallel and Distributed Systems]Reliable Message Delivery in Distributed Publish/Subscribe Systems Using Overlay Neighborhoods (being submitted) [Middleware Demos/Posters’12]Introducing Publiy (being submitted)

  4. Thesis Contributions List of publications: [ACM Surveys]Dependable publish/subscribe systems (being submitted) [Middleware’12] Opportunistic Multi-Path Publication Forwarding in Pub/Sub Overlays [ICDCS’12]Publiy+: A Peer-Assisted Pub/Sub Service for Timely Dissemination of Bulk Content [SRDS’11]Partition-Tolerant Distributed Publish/Subscribe Systems [SRDS’09]Reliable and Highly Available Distributed Publish/Subscribe Service [ACM Transactions on Parallel and Distributed Systems]Reliable Message Delivery in Distributed Publish/Subscribe Systems Using Overlay Neighborhoods (being submitted) [Middleware Demos/Posters’12]Introducing Publiy (being submitted)

  5. Thesis Contributions List of publications: [ACM Surveys]Dependable publish/subscribe systems (being submitted) [Middleware’12] Opportunistic Multi-Path Publication Forwarding in Pub/Sub Overlays [ICDCS’12]Publiy+: A Peer-Assisted Pub/Sub Service for Timely Dissemination of Bulk Content [SRDS’11]Partition-Tolerant Distributed Publish/Subscribe Systems [SRDS’09]Reliable and Highly Available Distributed Publish/Subscribe Service [ACM Transactions on Parallel and Distributed Systems]Reliable Message Delivery in Distributed Publish/Subscribe Systems Using Overlay Neighborhoods (being submitted) [Middleware Demos/Posters’12]Introducing Publiy (being submitted)

  6. Thesis Contributions Overlay Neighborhoods List of publications: [ACM Surveys]Dependable publish/subscribe systems (being submitted) [Middleware’12] Opportunistic Multi-Path Publication Forwarding in Pub/Sub Overlays [ICDCS’12]Publiy+: A Peer-Assisted Pub/Sub Service for TimelyDissemination of Bulk Content [SRDS’11]Partition-Tolerant Distributed Publish/Subscribe Systems [SRDS’09]Reliable and Highly Available Distributed Publish/Subscribe Service [ACM Transactions on Parallel and Distributed Systems]Reliable Message Delivery in Distributed Publish/Subscribe Systems Using Overlay Neighborhoods (being submitted) [Middleware Demos/Posters’12]Introducing Publiy (being submitted)

  7. Dependability in Pub/Sub Systems Part I Publications:[SRDS’11]Partition-Tolerant Distributed Publish/Subscribe Systems[SRDS’09]Reliable and Highly Available Distributed Publish/Subscribe Service[ACM Transactions on Parallel and Distributed Systems]Reliable Message Delivery in Distributed Publish/Subscribe Systems Using Overlay Neighborhoods (being submitted)[ACM Surveys]Dependable publish/subscribe systems(being submitted)[Middleware Demos/Posters’12]Introducing Publiy (being submitted)

  8. Challenges of Dependabilityin Content-based Pub/Sub Systems The “end-to-end principle” is not applicable in a pub/sub system • Loose-coupling between publishers and subscribers (endpoints) • End-point cannot distinguish message loss from filtered messages: This is especially true in content-based systems supporting flexible publication filtering Filtered out(not matching sub) Loss cannot be differentiated from filtered pubs ✗ ✗ ✓ ✓ Pub/Sub Middleware ? P S

  9. Overlay Neighborhoods Primary network: An initial spanning tree • Brokers maintain neighborhood knowledge • Allows brokers to transform overlayin a controlled manner d-Neighborhood knowledge(dis a config. parameter): • Knowledge of other brokers within distance d • Knowledge of forwarding paths within neighborhood 3-neighborhood 2-neighborhood 1-neighborhood

  10. Publication Forwarding Algorithm • Received pubs are placed on a FIFO msg queue and kept until processing is complete • All known subscriptions having interest in pare identified after matching • Forwarding path of the publication within downstream neighborhoods are identified • Publication is sent to closest available brokers towards matching subscribers • p upstream queue d-neighborhood downstream S S S

  11. When There are Failures • Broker reconnects the overlay by creating new links to neighbors of the failed brokers • Publications in message queue are re-transmitted bypassing failed neighbors • Multiple concurrent failed neighbors (up to d-1) are bypassed similarly P S S S

  12. Expected # of deliveries w/o failures Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)

  13. Expected # of deliveries w/o failures Actual deliveries with failures Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)

  14. Expected # of deliveries w/o failures Actual deliveries with failures Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)

  15. Expected # of deliveries w/o failures Actual deliveries with failures Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)

  16. Expected # of deliveries w/o failures Actual deliveries with failures Low deliveries with d=1 Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)

  17. Low deliveries with d=1 Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)

  18. Low deliveries with d=1 Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)

  19. Opportunistic Multi-pathpublication Forwarding Part II Publications:[Middleware’12] Opportunistic Multi-Path Publication Forwarding in Pub/Sub Overlays

  20. Problems in Existing Pub/Sub Systems • Forwarding paths in the overlay are constructed in“fixed end-to-end” manner (no/little path diversity) • This results in a high number of “pure forwarding” brokers • Low yield (ratio of msgs delivered over msgs sent is small)  Low efficiency P ✗ ✓ ✗ ✗ S E D ✓ C B A

  21. Multi-Path Forwarding in a Nutshell Actively utilize neighborhoods A Soft links

  22. Different Forwarding Strategies • Conventional systems:Strategy 0 Total msgs: 6 • Forwarding strategy 1 Total msgs: 5 • Forwarding strategy 2 Total msgs: 3 p * * * * * * * * * * * * * * * A A A B B B C C C * * * p p

  23. S2 outperforms S0 by 90% S1 outperforms S0 by 60% Maximum System Throughput Experiment setup: 250 brokers Publish rate of 72,000 msgs/min

  24. Bulk Content Dissemination inpub/sub systems Part III Publications:[ICDCS’12]Publiy+: A Peer-Assisted Publish/Subscribe Service for Timely Dissemination of Bulk Content

  25. Applications Scenarios InvolvingBulk Content Dissemination Replicationwithin CDN Socialnetworks File synch. P2P filesharing Fast replication of content:(video clips, pics) • Scalability • Reactive delivery • Selective delivery Distributionof softwareupdates

  26. Data layer Hybrid Architecture brokers A case for a peer-assisted design Control layer (for metadata) • P/S broker overlay • Distributed repositorymaintaining users’subscriptions Data layer (for actual data) • Form peer swarm • Exchange blocksof data Control layer Subscribe Subscribe Subscribe Subscribe Subscribe subscribers

  27. Scalability w.r.t. Number of Subscribers Network setup: 300 and 1000 clients 1 source publishing 100 MB of content

  28. Conclusion • We introduced the notion of overlay neighborhoods in distributed pub/sub systems • Neighborhoods expose brokers’ knowledge of nearby neighbors and the publication forwarding paths that crosses these neighborhoods • We used neighborhood in different ways • Passive use of neighborhoods for ensuring reliable and ordered delivery • Active use of neighborhoods formultipath publication forwarding • Bulk content dissemination

  29. Thanks for your attention!

  30. EXTRAS BONUS SLIDES if needed!

  31. Overlay Neighborhoods

  32. Content-Based Publish/Subscribe NY London P P Publish P Toronto Pub/Sub S S S S S P S sub = [STOCK=IBM] Trader 1 S Trader 2 sub= [CHANGE>-8%] Stock quote dissemination application

  33. System Architecture Tree dissemination networks: One path from source to destination • Pros: • Simple, loop-free • Preserves publication order(difficult for non-tree content-based P/S) • Cons: • Trees are highly susceptible to failures Primary tree:Initial spanning tree that is formed as brokers join the system • Maintain neighborhood knowledge • Allows brokers to reconfigure overlayafter failures on the fly ∆-Neighborhood knowledge: ∆ is configuration parameterensures handling ∆-1 concurrent failures (worst case) • Knowledge of other brokers within distance ∆ Join algorithm • Knowledge of routing paths within neighborhood Subscription propagation algorithm 3-neighborhood 2-neighborhood 1-neighborhood

  34. Overlay Disconnections When there are d or more concurrent failures • Publication delivery may be interrupted • No publication loss B E B D B C B B B A Failed chain of d brokers Subtree Subtree Remain connected Disconnected Subtrees are Disconnected

  35. Experimental Evaluation Studied various aspects of system’s operation: • Impact of failures/recoveries on delivery delay • Impact of failures on other brokers • Size of d-neighborhoods • Likelihood of disconnections • Impact of disconnections on system throughput Discussed next

  36. Publication Forwarding in Absence of Overlay Fragments • Forwarding only uses subscriptions accepted brokers. • Steps in forwarding of publication p: • Identify anchor of accepted subscriptions that match p • Determine active connections towards matching subscriptions’ anchors • Send p on those active connections and wait for confirmations • If there are local matching subscribers, deliver to them • If no downstream matching subscriber exists, issue confirmation towards P • Once confirmations arrive, discard p and send a conf towards p P E D C B A S p p p p p p conf conf conf conf conf conf p Publications Subscriptions E C Deliver to localsubscribers ☑ ☑ ☑ ☑ ☑ ☑ ☑

  37. Publication Forwarding in Presence of Overlay Partitions • Key forwarding invariant to ensure reliability:we ensure that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription. • Case1: Sub s has been accepted with no pid. It is safe to bypass intermediate brokers P E D C B A S Publications Subscriptions p p p p B D conf conf conf conf ☑ ☑ C Deliver to localsubscribers ☑ ☑ ☑ ☑ ☑

  38. Publication Forwarding (cont’d) • Case2: Sub s has been accepted with some pid. • Case 2a: Publisher’s local broker has accepted s and we ensure all intermediate forwarding brokers have also done so:  It is safe to deliver publications from sources beyond the partition. P E D C B A S Publications Subscriptions p p p p B D conf conf conf conf ☑ ☑ C ☑ ☑ ☑*

  39. Publication Forwarding (cont’d) • Case2: Sub s has been accepted with some pid. • Case 2a: Publisher’s local broker has accepted s and we ensure all intermediate forwarding brokers have also done so:  It is safe to deliver publications from sources beyond the partition. P E D C B A S Publications Subscriptions p p p p B D conf conf conf conf Depending on when this link has been establishedeither recovery or subscription propagation ensureC accepts s prior to receiving p ☑ ☑ C ☑ ☑ ☑*

  40. Publication Forwarding (cont’d) • Case2: Subscription s is accepted with some pid tags. • Case 2b: Publisher’s broker has not accepted s: It is unsafe to deliver publications from this publisher (invariant). P E D C B A S Subscriptions Publications p p p p* p p ☑* s was acceptedat S with the same pid tag ☑ Tag with pid

  41. Overlay Fragments • When primary tree is setup, brokers communicate with their immediate neighbors in the primary tree through FIFO links. • Overlay fragments: Broker crash or link failures creates “fragments” and some neighbor brokers “on the fragment” become unreachable from neighboring brokers • Active connections: At each point they try to maintain a connection to its closest neighbor in the primary tree. • Only active connections are used by brokers P F E D C B A S x Active connection to E D pid1=<C, {D}> Brokers on the fragment Brokers beyondthe fragment Brokers onthe fragment ? Fragment detector

  42. Overlay Fragments – 2 Adjacent Failures • What if there are more failures, particularly adjacent failures? • If ∆ is large enough the same process can be used for larger fragments. P F E D C B A S Active connection to F D E pid1=<C, {D}> + pid2=<C, {D, E}> Brokers beyondthe fragment Brokers onthe fragment

  43. Overlay Fragments - ∆ Adjacent Failures • Worst case scenario: ∆-neighborhood knowledge is not sufficient to reconnect the overlay. • Brokers “on” and “beyond” the fragment are unreachable. P F E D C B A S No new active connection F D E pid1=<C, {D}> pid2=<C, {D, E}> + pid3=<C, {D, E, F}> Brokers beyondthe fragment Brokers onthe fragment

  44. Fragments Brokers are connected to closest reachable neighbors & aware of nearby fragment identifiers. • How does this affect end-to-end connectivity? For any pair of brokers, a fragment on the primary path between them is: • An “island” if end-to-end brokers are reachable through a sequence of active connections • A “barrier” if end-toe-end brokers are unreachable through some sequence of active connections destination source destination source P P F F E E D D C C B B A A S S F D D E

  45. Store-and-Forward • A copy is first preserved on disk • Intermediate hops send an ACK to previous hop after preserving • ACKed copies can be dismissed from disk • Upon failures, unacknowledged copies survive failure and are re-transmitted after recovery • This ensures reliable delivery but may cause delays while the machine is down P P P P Tohere Fromhere ack ack ack

  46. Mesh-Based Overlay Networks [Snoeren, et al., SOSP 2001] • Use a mesh network to concurrently forward msgs on disjoint paths • Upon failures, the msg is delivered using alternative routes • Pros: Minimal impact on delivery delay • Cons: Imposes additional traffic & possibility of duplicate delivery Fromhere Tohere P P P P

  47. Replica-based Approach [Bhola , et al., DSN 2002] • Replicas are grouped into virtual nodes • Replicas have identical routing information PhysicalMachines Virtual node

  48. Replica-based Approach[Bhola , et al., DSN 2002] • Replicas are grouped into virtual nodes • Replicas have identical routing information • We compare against this approach Virtual node P P P P P P

  49. Problems with a Single Overlay Tree Overloaded root • Tree provides no routing diversity • Overloaded root • All traffic goes through asingle broker • Under utilization: Not all availablecapacity is effectively used ? Unutilizedbandwidth capacity Tree: Single path connectivitynot suitable for diverseforwarding patterns

  50. Related Work – Structured Topologies • A topology is an interconnection between brokers: • Topology relatively stable: long-term connections • Most commonly a global/per-publisher spanning tree • Topology adaptation change topology based on: • Traffic patterns [1,2] – optimize a cost function • Maintain acyclic property by adding + removing links • Advantages: • Fixed topology enables high-throughput connections • Routes may be improved from a “course-grained” system-wide perspective • Disadvantages: • Routes may never be optimal for individual broker pairs • Introduces pure forwarding brokers • Diversity of routing is not accounted for Tree A Re-configure  Tree A’    [1] Virgillito, A., Beraldi, R., Baldoni, R.: On event routing in content-based publish/subscribe through dynamic networks. In: FTDCS. (2003)[2] Virgillito, A., Beraldi, R., Baldoni, R.: On event routing in content-based publish/subscribe through dynamic networks. In: FTDCS. (2003)

More Related