270 likes | 491 Views
UltraLight Overview. Shawn McKee / University of Michigan USATLAS Tier1 & Tier2 Network Planning Meeting December 14, 2005 - BNL. The UltraLight Project. UltraLight is A four year $2M NSF ITR funded by MPS. Application driven Network R&D.
E N D
UltraLight Overview • Shawn McKee / University of Michigan • USATLAS Tier1 & Tier2 Network Planning Meeting • December 14, 2005 - BNL
The UltraLight Project • UltraLight is • A four year $2M NSF ITR funded by MPS. • Application driven Network R&D. • A collaboration of BNL, Caltech, CERN, Florida, FIU, FNAL, Internet2, Michigan, MIT, SLAC. • Significant international participation: Brazil, Japan, Korea amongst many others. • Goal: Enable the network as a managed resource. • Meta-Goal:Enable physics analysis and discoveries which could not otherwise be achieved.
UltraLight Backbone • UltraLight has a non-standard core network with dynamic links and varying bandwidth inter-connecting our nodes. • Optical Hybrid Global Network • The core of UltraLight is dynamically evolving as function of available resources on other backbones such as NLR, HOPI, Abilene or ESnet. • The main resources for UltraLight: • LHCnet (IP, L2VPN, CCC) • Abilene (IP, L2VPN) • ESnet (IP, L2VPN) • Cisco NLR wave (Ethernet) • Cisco Layer 3 10GE Network • HOPI NLR waves (Ethernet; provisioned on demand) • UltraLight nodes: Caltech, SLAC, FNAL, UF, UM, StarLight, CENIC PoP at LA, CERN
UltraLight Layer3 Connectivity • Shown (courtesy of Dan Nae) is the current UltraLight Layer 3 connectivity as of Mid-October 2005
UltraLight Sites • UltraLight currently has 10 participating core sites (shown alphabetically) • Details and diagrams for each site will be reported Tuesday during “Network” day
UltraLight Network: PHASE I • Implementation via “sharing” with HOPI/NLR • Also LA-CHI Cisco/NLR Research Wave • DOE UltraScienceNet Wave SNV-CHI (LambdaStation) • Connectivity to FLR to be determined • MIT involvement welcome, but unfunded AMPATH UERJ, USP Plans for Phase I from Oct. 2004
UltraLight Network: PHASE II • Move toward multiple “lambdas” • Bring in FLR, as well as BNL (and MIT) • General comment: We are almost here! AMPATH UERJ, USP
UltraLight Network: PHASE III • Move into production • Optical switching fully enabled amongst primary sites • Integrated international infrastructure • Certainly reasonable sometime in the next few years… AMPATH UERJ, USP
Workplan/Phased Deployment • UltraLight envisions a 4 year program to deliver a new, high-performance, network-integrated infrastructure: • Phase I will last 12 months and focus on deploying the initial network infrastructure and bringing up first services • Phase II will last 18 months and concentrate on implementing all the needed services and extending the infrastructure to additional sites • Phase III will complete UltraLight and last 18 months. The focus will be on a transition to production in support of LHC Physics (+ eVLBI Astronomy, +<insert your e-Science here>?) We are HERE!
UltraLight Network Engineering • GOAL: Determine an effective mix of bandwidth-management techniques for this application-space, particularly: • Best-effort/“scavenger” using effective ultrascale protocols • MPLS with QOS-enabled packet switching • Dedicated paths arranged with TL1 commands, GMPLS • PLAN: Develop, Test the most cost-effective integrated combination of network technologies on our unique testbed: • Exercise UltraLight applications on NLR, Abilene and campus networks, as well as LHCNet, and our international partners • Progressively enhance Abilene with QOS support to protect production traffic • Incorporate emerging NLR and RON-based lightpath and lambda facilities • Deploy and systematically study ultrascale protocol stacks (such as FAST) addressing issues of performance & fairness • Use MPLS/QoS and other forms of BW management, and adjustments of optical paths, to optimize end-to-end performance among a set of virtualized disk servers
UltraLight: Effective Protocols • The protocols used to reliably move data are a critical component of Physics “end-to-end” use of the network • TCP is the most widely used protocol for reliable data transport, but is becoming ever more ineffective for higher and higher bandwidth-delay networks. • UltraLight is exploring extensions to TCP (HSTCP, Westwood+, HTCP, FAST) designed to maintain fair-sharing of networks and, at the same time, to allow efficient, effective use of these networks. • Currently FAST is in our “UltraLight Kernel” (a customized 2.6.12-3 kernel). This was used in SC2005. We are planning to broadly deploy a related kernel with FAST. Longer term we can then continue with access to FAST, HS-TCP, Scalable TCP, BIC and others.
UltraLight Kernel Development • Having a standard tuned kernel is very important for a number of UltraLight activities: • Breaking the 1 GB/sec disk-to-disk barrier • Exploring TCP congestion control protocols • Optimizing our capability for demos and performance • The current kernel incorporates the latest FAST and Web100 patches over a 2.6.12-3 kernel and includes the latest RAID and 10GE NIC drivers. • The UltraLight web page (http://www.ultralight.org ) has a Kernel page which provides the details off the Workgroup->Network page
MPLS/QoS for UltraLight • UltraLight plans to explore the full range of end-to-end connections across the network, from best-effort, packet-switched through dedicated end-to-end light-paths. • MPLS paths with QoSattributes fill a middle ground in this network space and allow fine-grained allocation of virtual pipes, sized to the needs of the application or user. TeraPaths Initial QoS test at BNL UltraLight, in conjunction with the DoE/MICS funded TeraPaths effort, is working toward extensible solutions for implementing such capabilities in next generation networks Terapaths URL: http://www.atlasgrid.bnl.gov/terapaths/
Optical Path Plans • Emerging “light path” technologies are becoming popular in the Grid community: • They can extend and augment existing grid computing infrastructures, currently focused on CPU/storage, to include the network as an integral Grid component. • Those technologies seem to be the most effective way to offer network resource provisioning on-demand between end-systems. • A major capability we are developing in Ultralight is the ability to dynamically switch optical paths across the node, bypassing electronic equipment via a fiber cross connect. • The ability to switch dynamically provides additional functionality and also models the more abstract case where switching is done between colors (ITU grid lambdas).
MonaLisa to Manage LightPaths • Dedicated modules to monitor and control optical switches • Used to control • CALIENT switch @ CIT • GLIMMERGLASS switch @ CERN • ML agent system • Used to create global path • Algorithm can be extended to include prioritisation and pre-allocation
Monitoring for UltraLight • Network monitoring is essential for UltraLight. • We need to understand our network infrastructure and track its performance both historically and in real-time to enable the network as a managed robust component of our overall infrastructure. • There are two ongoing efforts we are leveraging to help provide us with the monitoring capability required: IEPM http://www-iepm.slac.stanford.edu/bw/ MonALISA http://monalisa.cern.ch • We are also looking at new tools like PerfSonar which may help provide a monitoring infrastructure for UltraLight.
MonALISA UltraLight Repository • The UL repository: http://monalisa-ul.caltech.edu:8080/
End-Systems performance • Latest disk to disk over 10Gbps WAN: 4.3 Gbits/sec (536 MB/sec) - 8 TCP streams from CERN to Caltech; windows, 1TB file, 24 JBOD disks • Quad Opteron AMD848 2.2GHz processors with 3 AMD-8131 chipsets: 4 64-bit/133MHz PCI-X slots. • 3 Supermicro Marvell SATA disk controllers + 24 SATA 7200rpm SATA disks • Local Disk IO – 9.6 Gbits/sec (1.2 GBytes/sec read/write, with <20% CPU utilization) • 10GE NIC • 10 GE NIC – 7.5 Gbits/sec (memory-to-memory, with 52% CPU utilization) • 2*10 GE NIC (802.3ad link aggregation) – 11.1 Gbits/sec (memory-to-memory) • Need PCI-Express, TCP offload engines • Need 64 bit OS? Which architectures and hardware? • Discussions are underway with 3Ware, Myricom and Supermicro to try to prototype viable servers capable of driving 10 GE networks in the WAN.
UltraLight Global Services • Global Services support management and co-scheduling of multiple resource types, and provide strategic recovery mechanisms from system failures • Schedule decisions based on CPU, I/O, Network capability and End-to-end task performance estimates, incl. loading effects • Decisions are constrained by local and global policies • Implementation: Autodiscovering, multithreaded services, service-engines to schedule threads, making the system scalable and robust • Global Services Consist of: • Network and System Resource Monitoring, to provide pervasive end-to-end resource monitoring info. to HLS • Network Path Discovery and Construction Services, to provide network connections appropriate (sized/tuned) to the expected use • Policy Based Job Planning Services, balancing policy, efficient resource use and acceptable turnaround time • Task Execution Services, with job tracking user interfaces, incremental re-planning in case of partial incompletion • These types of services are required to deliver a managed network. Work along these lines is planned for OSG and future proposals to NSF and DOE.
UltraLight Application in 2008 • Node1> fts –vvv –in mercury.ultralight.org:/data01/big/zmumu05687.root –out venus.ultralight.org:/mstore/events/data –prio 3 –deadline +2:50 –xsum • FTS: Initiating file transfer setup… • FTS: Remote host responds ready • FTS: Contacting path discovery service • PDS: Path discovery in progress… • PDS: Path RTT 128.4 ms, best effort path bottleneck is 10 GE • PDS: Path options found: • PDS: Lightpath option exists end-to-end • PDS: Virtual pipe option exists (partial) • PDS: High-performance protocol capable end-systems exist • FTS: Requested transfer 1.2 TB file transfer within 2 hours 50 minutes, priority 3 • FTS: Remote host confirms available space for DN=smckee@ultralight.org • FTS: End-host agent contacted…parameters transferred • EHA: Priority 3 request allowed for DN=smckee@ultralight.org • EHA: request scheduling details • EHA: Lightpath prior scheduling (higher/same priority) precludes use • EHA: Virtual pipe sizeable to 3 Gbps available for 1 hour starting in 52.4 minutes • EHA: request monitoring prediction along path • EHA: FAST-UL transfer expected to deliver 1.2 Gbps (+0.8/-0.4) averaged over next 2 hours 50 minutes
EHA: Virtual pipe (partial) expected to deliver 3 Gbps(+0/-0.3) during reservation; variance from unprotected section < 0.3 Gbps 95%CL • EHA: Recommendation: begin transfer using FAST-UL using network identifier #5A-3C1. Connection will migrate to MPLS/QoS tunnel in 52.3 minutes. Estimated completion in 1 hour 22.78 minutes. • FTS: Initiating transfer between mercury.ultralight.org and venus.ultralight.org using #5A-3C1 • EHA: Transfer initiated…tracking at URL: fts://localhost/FTS/AE13FF132-FAFE39A-44-5A-3C1 • EHA: Reservation placed for MPLS/QoS connection along partial path: 3Gbps beginning in 52.2 minutes: duration 60 minutes • EHA: Reservation confirmed, rescode #9FA-39AF2E, note: unprotected network section included. • <…lots of status messages…> • FTS: Transfer proceeding, average 1.1 Gbps, 431.3 GB transferred • EHA: Connecting to reservation: tunnel complete, traffic marking initiated • EHA: Virtual pipe active: current rate 2.98 Gbps, estimated completion in 34.35 minutes • FTS: Transfer complete, signaling EHA on #5A-3C1 • EHA: Transfer complete received…hold for xsum confirmation • FTS: Remote checksum processing initiated… • FTS: Checksum verified—closing connection • EHA: Connection #5A-3C1 completed…closing virtual pipe with 12.3 minutes remaining on reservation • EHA: Resources freed. Transfer details uploading to monitoring node • EHA: Request successfully completed, transferred 1.2 TB in 1 hour 41.3 minutes (transfer 1 hour 34.4 minutes)
Supercomputing 2005 • The Supercomputing conference (SC05) in Seattle, Washington held another “Bandwidth Challenge” during the week of Nov 14-18th • A collaboration of high-energy physicists from Caltech, Michigan, Fermilab and SLAC (with help from BNL: thanks Frank and John!) won achieving 131 Gbps peak network usage. • This SC2005 BWC entry from HEP was designed to preview the scale and complexity of data operations among many sites interconnected with many 10 Gbps links
BWC Take Away Summary • Our collaboration previewed the IT Challenges of the next generation science at the High Energy Physics Frontier (for the LHC and other major programs): • Petabyte-scale datasets • Tens of national and transoceanic links at 10 Gbps (and up) • 100+ Gbps aggregate data transport sustained for hours; We reached a Petabyte/day transport rate for real physics data • The team set the scale and learned to gauge the difficulty of the global networks and transport systems required for the LHC mission • Set up, shook down and successfully ran the system in < 1 week • Substantive take-aways from this marathon exercise: • An optimized Linux (2.6.12 + FAST + NFSv4) kernel for data transport; after 7 full kernel-build cycles in 4 days • A newly optimized application-level copy program, bbcp, that matches the performance of iperf under some conditions • Extensions of Xrootd, an optimized low-latency file access application for clusters, across the wide area • Understanding of the limits of 10 Gbps-capable systems under stress
UltraLight and ATLAS • UltraLight has deployed and instrumented an UltraLight network and made good progress toward defining and constructing a needed ‘managed network’ infrastructure. • The developments in UltraLight are targeted at providing needed capabilities and infrastructure for LHC. • We have some important activities which are ready for additional effort: • Achieving 10GE disk-to-disk transfers using single servers • Evaluating TCP congestion control protocols over UL links • Deploying embryonic network services to further the UL vision • Implementing some forms of MPLS/QoS and Optical Path control as part of standard UltraLight operation • Enabling automated end-host tuning and negotiation • We want to extend the footprint of UltraLight to include as many interested sites as possible to help insure its developments meet the LHC needs. • Questions?