1 / 43

End-To-End Provisioned Optical Network Testbed for Large-Scale eScience Applications

Nagi Rao, Bill Wing Computer Science and Mathematics Division Oak Ridge National Laboratory raons@ornl.gov,wrw@ornl.gov. Tony Mezzacappa Physics Division Oak Ridge National Laboratory mezzacappaa@ornl.gov.

samara
Download Presentation

End-To-End Provisioned Optical Network Testbed for Large-Scale eScience Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nagi Rao, Bill Wing Computer Science and Mathematics Division Oak Ridge National Laboratory raons@ornl.gov,wrw@ornl.gov Tony Mezzacappa Physics Division Oak Ridge National Laboratory mezzacappaa@ornl.gov End-To-End Provisioned Optical Network Testbed for Large-Scale eScience Applications Nov 12, 2003 Project Kick-off Meeting, University of Virginia Sponsored by NSF Experimental Infrastructure Networks Program

  2. Outline of Presentation • Project details • TSI network and application interface requirements • Transport for dedicated channels • dynamics of shared streams • channel stabilization • Work Plan

  3. ORNL Project Details Principal Investigators: Nagi Rao – Computer Scientist/Engineer Bill Wing – Network Engineer/Scientist Tony Mezzacappa – Astrophysicist Technical Staff: Qishi Wu – Post-Doctoral Fellow Menxia Zhu – Phd Student Steven Carter – Systems and Network Support Budget: 850K (364K year1)

  4. TSI Computations: Networking Support Networking Activities: Data transfers: Archive and supply massive amounts of data (terabyte/days) Interactive visualizations: Visualize archival or on-line data Remote steering and control: control computations and visualizations into regions of interest Coordinated operations: collaborative visualization and steering Visualization stream Data stream Control stream

  5. Types of Networking Channels High Bandwidth Data Channels: Off-line Transfers: Terabyte datasets Supercomputers – high performance storage systems Storage – host nodes and visualization servers On-line Transfers: Supercomputers – visualization nodes Control and Steering Channels: Interactive visualization – human response time Computational steering – respond to “inertia” of computation Coordinated Channels: Coordinated visualization, steering, and archival Multiple visualization and steering nodes On Internet: these channels can be supported only in a limited way • It is difficult to sustain large data rates in a fair manner • Unpredictability of transport dynamics makes it very difficult to achieve stability

  6. Data Transfers Over Dedicated Channels Several Candidate Protocols (to be tested): UDP-based data transport: UDT(SABUL), tsunami, hurricane, RBUDP, IQ-RUDP, and others Advantages: application-level implementations and conceptually simple methods Disadvantages: unstable code and hard to configure parameters Tuned TCP methods: net100: tune flow windows large enough to avoid self created losses Advantages: known mechanisms and tested kernel code Disadvantages: physical losses are problematic – TCP interprets physical losses as congestion and reduces throughput Host Issues for 1-10Gbps Rates: Impedance match issues • Buffering in NIC, kernel and application, disk speeds • –zero-copy kernel patch and ST • OS bypass, RDMA

  7. Multiple Streams Over Dedicated Channels • Example: • Monitor computation through a visualization channel • Interactive visualization – rotate, project different subspaces • Computational Steering – specify parameters on the fly • Archive/load the data – store the interesting data • Option 1: • Dedicated channels for each stream • 4 NICS – 4 MSPP slots • Option 2: • Share dedicated channels • single NIC and MSPP slots • realize sharing at protocol or application level Visualization stream Visualization control Steering Data stream • Option 3: • Visualization streams on one channel • Data and steering streams on another channel • two NIC and MSPP slots • realize sharing at protocol or application level High performance storage

  8. Terminology Review Connection: Logical: host site to host site Circuit or Channel or Bandwidth Pipe: Physical: NIC-NIC Stream: Logical: Application to application connection Visualization stream Data stream Control stream

  9. Dedicated NIC-NIC Channels Advantages: No other traffic on the channel • Simpler protocols: • Rate controllers with loss recovery mechanisms would suffice for • data transfers and • control channels for host-host connections • Coordination between the streams can be handled at application/middleware level Disadvantages: • Scaling problems: • single connection requires 4 NIC-NIC pairs and 4 channels in the example • main computation site supporting 5 users requires • host with 20 NICs and 20 channels • MSPP with least 20 slots (e.g 5 blades each with 4 GigE slots) • Utilization problems: • Even a small control stream needs an entire channel (with minimum resolution) • E.g., 10Mbps control stream on GigE channel

  10. Multiple Streams on Single NIC-NIC Channel Streams interact and affect each other: • Packets may be “pooled” at the source and destination nodes: • NIC – interrupt coalescing and buffer clearing • NIC-kernel transfers through buffers • Kernel-application transfers • Processor load determines interrupt response time at finer levels Two important consequences • Protocols or applications need to “share” the channel • Need protocols that allow for appropriate bandwidth sharing • TCP-like paradigm but a more structured problem • Total bandwidth is known • Competing traffic is host generated • Protocol interaction could generate complicated dynamics • Need protocols that stabilize the dynamics for control channels • Very few protocols exist that protect against “underflow” • Need a combination of existing and newer protocols

  11. Computational steering dynamics visualization TSI Application interfaces and networking modules data transfers applications interfaces Application module1 Application Module 2 Application Module 3 middleware Stabilization modules Control modules Bulk transport modules protocols streaming protocols channels Dedicated provisioned channels

  12. Interfacing with visualization modules Overall Approach: Separate the steering and display components: • Steering module – connect it visualization control channel • Display module • Separate rendering and display sub-modules and locate them at hosts • Connect sub-modules over data channels • Candidates under consideration – all need hooks to use dedicated channels • OpenGL, VTK codes – code needs to be modified with appropriate calls – non-trivial • enSight • can operate across IP networks without firewalls • High cost and no access to source code • Paraview • stability problems and hard to use • Aspect (?) • Developed at ORNL • Has functionality similar to Paraview with additional analysis modules • Developers are willing to incorporate CHEETAH modules • On-line streaming • Large datasets

  13. Optimizing visualization pipeline on a network Decomposition of visualization pipeline: • “links” have different bandwidths • Geometry could be larger than data • Display bandwidth can be much smaller – human consumption • tasks require different computational power • Large datasets require a cluster to compute the geometry • Rendering can be done on graphics-enabled machines • Display can be transferred to X-enabled machine Pipeline can be realized over the network and display can be forward to user host geometry computation data storage display rendering Host node

  14. Protocols for dedicated channels – multiple data streams Problem is simpler than Internet: Total available channel bandwidth is known All traffic is generated by the nodes and is “known” Fairness issues are simpler – nodes can allocate bandwidth among streams TCP addresses these problems over the Internet: slow-start to figure out available bandwidth packet loss and time-out to conclude traffic levels AIMD to adjust the flow rate Bandwidth partitioning among data streams might require close-loop control: Simply (open-loop) control of data rates at application level does not always work: Example: NIC has higher capacity than the provisioned channel: 1. packets might be combined and sent out at higher rate by NIC causing losses at MSPP 2. packets can be coalesced at receiver NIC resulting rates different from sending

  15. Protocols for dedicated channels – multiple data and control streams Problem is to maintain “steady” dynamics for the control streams between applications Not just between NICs or at the line Complicated end-to-end dynamics can be caused by various factors: Channel losses: Physical losses Losses due to sum of streams exceeding the capacity Impedance mismatch between NIC and line NIC and kernel kernel and application On the Internet: Only probabilistic solution is possible over Internet because of complicated cross traffic dynamics – our solutions based on stochastic approximation TCP does not solve the problem Multiple TCP/UDP streams generate chaos-like dynamics Single TCP stream on the dedicated channel has underflow problem Tune the flow-window at the desired level and adjust AIMD not to kick-in burst of losses can kill the stream – TCP interprets This problem still simpler than Internet: Here cross-traffic is generated by the nodes and is “known” Channels must explicitly stabilized using application-level closed loop control

  16. Complicated Dynamics Interacting Streams Simulation Results: TCP-AIMD exhibits chaos-like trajectories TCP streams competing with each other on a dedicated link (Veres and Boda 2000) TCP competing with UDP on a dedicated link (Rao and Chua 2002) Analytical Results (Rao and Chua 2002): TCP-AIMD has chaotic regimes Competing with UDP steady streams on a dedicated link State space analysis and Poincare maps Internet Measurements (2003, last few weeks): TCP-AIMD traces are a complicated mixture of stochastic and chaotic components Note: on dedicated links we expect less or no chaotic component

  17. Internet Measurements – Joint work with Jianbo Gao Question: How relevant are the simulation and analytical results on chaotic trajectories? Answer: Only partially. Internet (net100) traces show that TCP-AIMD dynamics are complicated mixture of chaotic and stochastic regimes: • Chaotic – TCP-AIMD dynamics • Stochastic – TCP response to network traffic Basic Point: TCP Traces collected on all Internet connections showed complicated dynamics • classical “saw-tooth” profile is not seen even once • This is not a criticism against TCP, it was not intended for smooth dynamics

  18. Cwnd time series for ORNL-LSU connection Connection: OC192 to Atlanta-Sox; Internet2 to Houston; LAnet to LSU

  19. Both Stochastic and Chaotic Parts are dominant Lorenz – chaotic Common envelope Uniform Random Spread out • TCP traces have: • common envelope and • spread out at certain scales

  20. Characterized as Anomalous Diffusions Log-log displacement curves Large exponent: typical of chaotic systems with injected noise

  21. End-to-End Delay Dynamics Control: End Filtering U. Oklahoma U. Oklahoma ORNL Objective: Achieve smooth end-to-end delay Solution: 1. Reduce end-to-end delay using two-paths via daemons: ORNL-OU, ORNL-ODU_OU 2. Filter the output at destination Internet Connection ORNL: source Destination U. Oklahoma ORNL filter Old Dominion Uni. Old Dominion Uni. X-axis: message sizes (bytes) Y-axis: end-to-end delay (sec)

  22. Throughput Stabilization – Joint work with Qishi Wu • Niche Application Requirement: Provide stable throughput at a target rate - typically much below peak bandwidth • Commands for computational steering and visualization • Control loops for remote instrumentation • TCP AIMD is not suited for stable throughput • Complicated dynamics • Underflows with sustained traffic

  23. Measurements: ORNL-LSU ORNL-LSU old connection: Esnet peering with Abilene in New York Both hosts have 10M NICS • Throughput stabilized within seconds at the target rate and was stable under: • Large and small ftp at hosts and LAN • Web browsing

  24. Stochastic Approximation: UDP window-based method Transport control loop Objective: adjust source rate to achieve (almost) fixed goodput at the destination Difficulty: data packets and acks are subject to random processes Approach: Rely on statistical properties of data paths

  25. Throughput and loss rates vs. window size and cycle time Typical day Christmas day Objective: adjust source rate to yield the desired throughput at destination

  26. Adaptation of source rate • Adjust the window size • Adjust cycle-time • Both are special cases of classical Robbins-Monroe method Target throughput Noisy estimate

  27. Performance Guarantees • Summary: Stabilization is achieved with a high probability with a very simple estimation of source rate • Basic result: for the general update • We have

  28. Internet Measurements • ORNL-LSU connection (before recent upgrade) • Hosts with 10 M NIC • 2000 mile network distance • ORNL-NYC – ESnet • NYC-DC-Hou – Abilene • HOU-LSU – Local n/s • ORNL-GaTech Connection • Hosts with GigE NICS • ORNL-Juniper router – 1Gig link • Juniper- ATL Sox – OC192 (1Gig link) • Sox-GaTech – 1Gig link

  29. ORNL-LSU Connection

  30. Goodput Stabilization: ORNL-LSUExperimental Results • Case 2. Target goodput = 2.0 Mbps, rate control through congestion window, a = 0.8, • Case 1: Target goodput = 1.0 Mbps, rate control through congestion window, a = 0.8, Datagram acknowledging time ( ) vs. source rate (Mbps) & goodput (Mbps) Datagram acknowledging time ( ) vs. source rate (Mbps) & goodput (Mbps)

  31. Goodput Stabilization: ORNL-LSUExperimental Results • Case 3. Target goodput = 3.0 Mbps, rate control through congestion window, a = 0.8, Datagram acknowledging time ( ) vs. source rate (Mbps) & goodput (Mbps)

  32. Goodput Stabilization:ORNL-LSUExperimental Results • Case 5. Target goodput = 2.0 Mbps, rate control through sleep time, a = 0.9, • Case 4. Target goodput = 2.0 Mbps, rate control through sleep time, a = 0.8, Datagram acknowledging time ( ) vs. source rate (Mbps) & goodput (Mbps)

  33. Throughput Stabilization: ORNL-GaTech Desired goodput level = 2.0 Mbps, a = 0.8, , adjustment made on sleep time . Desired goodput level = 20.0 Mbps, a = 0.8, , adjustment made on congestion window

  34. Experiments with tsunamifirebird.ccs.ornl.gov – ccil.cc.gatech.edu • Network transport control settings: • NIC speed and path bandwidth: 1 Gbps • Transferred file size: 204,800,000 bytes • Using default_block_size: 32768 bytes • Transmission statistics from Tsunami: • Ave. sending rate 296.05 Mbps • Loss rate: 64.32% • Transfer time: 17.51 sec • Throughput: 93.6 Mbps • Sending time&receiving time vs. block sequence number (figure next slide)

  35. Tsunami measurements ozy4.csm.ornl.gov – resource.rrl.lsu.edu • Path bandwidth: 10 Mbps • Using datagram size: 1400 bytes (the default one doesn’t work) • File size: 10,240,000 bytes • Case 1: Only Tsunami running • Throughput 9.47 Mbps (receiver, client) • Goodput 4.20 Mbps (sender, server) • Sending time&receiving time vs. datagram sequence number (figure right)

  36. Case 2: Only ONTCOU (throughput maximization SA) running • Source goodput: 3.5 Mbps • Sending time&acknowledging time vs. datagram sequence number • Sending rate vs. source goodput

  37. Case 3: Tsunami and ONTCOU running simultaneously with the same datagram size • Tsunami • Not completed • ONTCOU • Transmission completed • Throughput: 0.533Mbps • Sending time&acknowledging time vs. datagram sequence number (figure next)

  38. ORNL Year 1 Tasks Design and test transport protocols for dedicated channels • Single data streams – collaboration with UVa • One data and two control streams Testing on ORNL-ATL-ORNL GigE-SONET link Interfaces with visualization software: Simple supernova computation at ORNL hosts on dedicated link Developing interfaces to Aspect visualization modules and testing Test Paraview and EnSight ORNL host 1 linux OC 192 Juniper M160 router SOX router ORNL host 1 linux Atlanta ORNL

  39. ORNL Year 2 Tasks Design and test transport protocols for dedicated channels Multiple data, visualization and control streams Testing on CHEETAH testbed Interface with visualization: Interfacing supernova visualization modules over CEETAH Developing interfaces to Aspect visualization modules with TSI dataset

  40. ORNL Year 3 Tasks Design and test transport protocols for dedicated channels Collaborating multiple data, visualization and control streams Testing on CHEETAH testbed Interface with visualization: Interfacing supernova visualization and computation modules over CEETAH Developing interfaces to Aspect visualization modules with TSI on-line computations Optimizing mapping of visualization pipeline

  41. Feedback and Corrections

  42. Interfacing with steering modules Dynamics of visualization control and steering streams must be stabilized from application to application • Not enough to stabilize lower transport levels • NIC to line transfers may not be smooth • Application to kernel transfers depend on the processor load • Provide a user interface for steering and connect it to transport modules

More Related