1 / 20

A Virtual Circuit Multicast Transport Protocol (VCMTP) for Scientific Data Distribution

A Virtual Circuit Multicast Transport Protocol (VCMTP) for Scientific Data Distribution. Jie Li and Malathi Veeraraghavan University of Virginia Steve Emmerson University Corporation for Atmospheric Research Robert D. Russell University of New Hampshire April 23, 2013.

jui
Download Presentation

A Virtual Circuit Multicast Transport Protocol (VCMTP) for Scientific Data Distribution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Virtual Circuit Multicast Transport Protocol (VCMTP) for Scientific Data Distribution JieLi and MalathiVeeraraghavan University of Virginia Steve Emmerson University Corporation for Atmospheric Research Robert D. Russell University of New Hampshire April 23, 2013 • This work was supported by the NSF grants OCI-1038058 • and OCI-1127340, and DOE grants DE-SC002350 and DESC0007341

  2. I. Case Study of A Scientific Data Distribution Application

  3. Background • Internet Data Distribution (IDD) Project • Developed by University Corporation for Atmospheric Research (UCAR) • Distributes real-time meteorology data • 10 GB/hour data generation rate • Subscriber base: 170 institutions • Software used for distribution: Local Data Manager (LDM)

  4. Question • Which of these network services is best suited for IDD data? • IP routed service + unicast TCP (current mechanism) • Static circuits (leased lines) • if continuous data flow, is this an option? • Scheduled dynamic circuit service (DCS) • if data flow is long-lived, option? • P2P • Multicast

  5. To answer this question • Per-flow data characteristics insufficient • typical classification: • loss-sensitive, high throughput • delay-sensitive, low latency • Instead, need distribution topology • consider whole network view

  6. CONDUIT data • Installed and configured the LDM to receive CONDUIT data from UCAR • Parsed and analyzed the log files for received data(9 sample days) • Peak throughput: 250 MB/minute (SD: 28.8 MB/minute) • Total size of generated data: ~60 GB/day (SD: 0.3 GB/day)

  7. Distribution structure • Downloaded and parsed real-time statistics of the CONDUIT feed tree • Data Distribution Topology of the CONDUIT feedtype • For the max fan-out of 104 receivers, the peak bandwidth requirement is 104 * 250 MB/minute ≈ 3.5Gbps • This is just for a single feedtype of a single application CONDUIT Feed Tree Topology Information * This maximum fan-out number is forthe UCAR site (idd.unidata.ucar.edu)

  8. CONDUIT distribution topology http://www.unidata.ucar.edu/cgi-bin/rtstats/rtstats_topogif?CONDUIT

  9. Answer to question • Different network service types • Static unicast VCs: unsuitable • Divide NCAR access link bandwidth between 104 subscribers: if 10 Gbps, then ~10 Mbps per subscriber • Subscribers would like to receive the data asap (low rate VC will increase latency) • Dynamic unicast VCs:unsuitable • For the worst-case fanout of 104, the total delay will be greater than with IP service, since for each receiver a new circuit needs to be set up, which can only be done after the transfer to the previous receiver is complete and the circuit to that receiver is released. • Multicast: can save bandwidth and computing resource

  10. New options: multicast and P2P • Multicast • Pros: total delay for distributing the data to the receivers will be lower for a given computing capacity of the upstream servers, or conversely, the same transfer delay can be achieved as with IP-routed service or P2P but with smaller upstream server computing capacity. • Cons: one or more slow receivers can slow down everyone • P2P • Pros: scales better with the number of receivers; suitable when files are obtained by different participants at different times • Cons: not suitable for real-time or near real-time delivery (which is a key requirement of IDD)

  11. II. VCMTP: Design and Prototyping

  12. VCMTP Requirements • Goal: Design and implement a reliable and scalable transport protocol for data distribution over high-speed multipoint virtual circuits • Requirements • Reliability: error control, flow control • Scalability: support at least hundreds of receivers • High-speed multicast: support Gbps transfers

  13. VCMTP Operational Overview • A Negative Acknowledgment (NACK) based reliable transport protocol • Data blocks transmitted over a multicast network service (can be unreliable) • Retransmissions carried over a reliable unicast service (e.g., TCP)

  14. VCMTP Prototyping • A user-level library implemented in C++ for Linux OS environment • Asynchronous programming model • Simultaneous data multicast and retransmission VCMTP Sender Process … Sending Thread Retransmission Thread 1 Retransmission Thread N Coordinator Thread Receiving Thread Retransmission Request Thread … Receiving Thread Retransmission Request Thread VCMTP Receiver Process 1 VCMTP Receiver Process N

  15. Evaluation Metrics for Continuous File Transfers • Metric for fast receivers: Throughput • nf: number of fast receivers • m:number of continuously sent files • Fi: size of file i • Ti,vcmtp: transfer time for file i • Metric for slow receivers: Robustness • ns: number of slow receivers • m: number of continuous files • Sij: an indicator variable that is set to 1 if file i was successfully received at receiver j, or 0 otherwise

  16. Experimental Evaluation: Throughput • Experiments conducted in the Emulabtestbed (hosted by Univ. of Utah) • 40% slow receivers experienced random packet drops at different rates • Rho is the traffic intensity calculated from the average file size (Pareto distribution) and inter-arrival time (exponential distribution); link rate = 100 Mbps • Experiment: 500 files; repeat 5 times

  17. Experimental Evaluation: Robustness

  18. Key Evaluation Observations • Increase in total number of receivers (and hence number of slow receivers) has adverse impact on both robustness and throughput because of resource contention • Both robustness and throughput decrease as traffic intensity (Rho) or loss rate increases • The sending-side retransmission timeout factor offers a knob for trading off robustness against throughput

  19. Summary • Multicast VCs are suitable for scientific data distribution applications • VCMTP: a reliable multicast transport protocol is designed, prototyped, and evaluated • Tradeoff between robustness and throughput for continuous file delivery

  20. Thank You! & Questions?

More Related