260 likes | 395 Views
FAST TCP for Multi-Gbps WAN: Experiments and Applications. Les Cottrell & Fabrizio Coccetti– SLAC Prepared for the Internet2, Washington, April 2003 http://www.slac.stanford.edu/grp/scs/net/talk/fast-i2-apr03.html.
E N D
FAST TCP for Multi-Gbps WAN: Experiments and Applications Les Cottrell & Fabrizio Coccetti– SLAC Prepared for the Internet2, Washington, April 2003 http://www.slac.stanford.edu/grp/scs/net/talk/fast-i2-apr03.html Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), by the SciDAC base program.
Outline • High throughput challenges • New TCP stacks • Tests on Unloaded (testbed) links • Performance of multi-streams • Performance of various stacks • Tests on Production networks • Stack comparisons with single streams • Stack comparisons with multiple streams • Fairness • Where do I find out more?
High Speed Challenges • After a loss it can take over an hour for stock TCP (Reno) to recover to maximum throughput at 1Gbits/s • i.e. loss rate of 1 in ~ 2 Gpkts (3Tbits), or BER of 1 in 3.6*1012 • PCI bus limitations (66MHz * 64 bit = 4.2Gbits/s at best) • At 2.5Gbits/s and 180msec RTT requires 120MByte window • Some tools (e.g. bbcp) will not allow a large enough window – (bbcp limited to 2MBytes) • Slow start problem at 1Gbits/s takes about 5-6 secs for 180msec link, • i.e. if want 90% of measurement in stable (non slow start), need to measure for 60 secs • need to ship >700MBytes at 1Gbits/s Sunnyvale-Geneva, 1500Byte MTU, stock TCP
New TCP Stacks • Reno (AIMD) based, loss indicates congestion • Back off less when see congestion • Recover more quickly after backing off • Scalable TCP: exponential recovery • Tom Kelly, Scalable TCP: Improving Performance in Highspeed Wide Area Networks Submitted for publication, December 2002. • High Speed TCP: same as Reno for low performance, then increase window more & more aggressively as window increases using a table • Vegas based, RTT indicates congestion • Caltech FAST TCP, quicker response to congestion, but … Standard Scalable High Speed cwnd=38pkts~0.5Mbits
Typical testbed 12*2cpu servers 6*2cpu servers 7609 T640 GSR 4 disk servers 4 disk servers OC192/POS (10Gbits/s) Chicago Sunnyvale 2.5Gbits/s (EU+US) 7609 Sunnyvale section deployed for SC2002 (Nov 02) 6*2cpu servers SNV Geneva CHI AMS > 10,000 km GVA
Testbed Collaborators and sponsors • Caltech: Harvey Newman, Steven Low, Sylvain Ravot, Cheng Jin, Xiaoling Wei, Suresh Singh, Julian Bunn • SLAC: Les Cottrell, Gary Buhrmaster, Fabrizio Coccetti • LANL: Wu-chun Feng, Eric Weigle, Gus Hurwitz, Adam Englehart • NIKHEF/UvA: Cees DeLaat, Antony Antony • CERN: Olivier Martin, Paolo Moroni • ANL: Linda Winkler • DataTAG, StarLight, TeraGrid, SURFnet, NetherLight, Deutsche Telecom, Information Society Technologies • Cisco, Level(3), Intel • DoE, European Commission, NSF
Windows and Streams • Well accepted that multiple streams (n) and/or big windows are important to achieve optimal throughput • Effectively reduces impact of a loss by 1/n, and improves recovery time by 1/n • Optimum windows & streams changes with changes (e.g. utilization) in path, hard to optimize n • Can be unfriendly to others
Even with big windows (1MB) still need multiple streams with Standard TCP • Above knee performance still improves slowly, maybe due to squeezing out others and taking more than fair share due to large number of streams • ANL, Caltech & RAL reach a knee (between 2 and 24 streams) above this gain in throughput slow
Stock vs FAST TCPMTU=1500B • Need to measure all parameters to understand effects of parameters, configurations: • Windows, streams, txqueuelen, TCP stack, MTU, NIC card • Lot of variables • Examples of 2 TCP stacks • FAST TCP no longer needs multiple streams, this is a major simplification (reduces # variables to tune by 1) Stock TCP, 1500B MTU 65ms RTT FAST TCP, 1500B MTU 65ms RTT FAST TCP, 1500B MTU 65ms RTT
TCP stacks with 1500B MTU @1Gbps txqueuelen
Jumbo frames, new TCP stacks at 1 Gbits/s SNV-GVA But: Jumbos not part of GE or 10GE standard Not widely deployed in end networks
Production network tests SLAC CERN APAN Stanford All 6 hosts have 1GE interfaces (2 SLAC hosts send simultaneously) Competing flows, no jumbos Host running “New” TCP CERN RTT = 202 ms GVA Remote host OC 48 Host running Reno TCP ESnet NIKHEF RTT = 158 ms CHICAGO OC 192 OC 12 SURFnet CHI AMS OC 48 Caltech RTT = 25 ms APAN RTT = 147 ms Abilene OC 12 SEATTLE CalREN SNV OC 12
High Speed TCP vs Reno – 1 Stream 2 separate hosts @ SLAC sending simultaneously to 1 receiver (2 iperf processes), 8MB window, pre-flush TCP config, 1500B MTU RTT bursty = congestion? Checked Reno vs Reno 2 hosts and very similar as expected
Scalable vs multi-streams SLAC to CERN, duration 60s, RTT 207ms, 8MB window
FAST & Scalable vs. Multi-stream Reno (SLAC>CERN ~230ms) • Bottleneck capacity 622Mbits/s • For short duration, very noisy, hard to distinguish Congestion events often sync Reno 1 streams 87 Mbits/s average FAST 1 stream 244 Mbits/s average Reno 8 streams 150 Mbits/s average FAST 1 stream 200 Mbits/s average
Fairness FAST vs Reno Reno alone 221Mbps 1 Stream, 16MB window, SLAC to CERN Fast alone 240Mbps Reno (45Mbps) & FAST (285Mbps) competing
Summary (very preliminary) • With single flow & empty network: • Can saturate 2.5 Gbps with standard TCP & jumbos • Can saturate 1Gbps with new stacks & 1500B frame or with standard & jumbos • With production network, • FAST can take a while to get going • Once going, FAST TCP with one stream looks good compared to multi-stream RENO • FAST can back down early compared to RENO • More work needed on fairness • Scalable • Does not look as good vs. multi-stream Reno
What’s next? • Go beyond 2.5Gbits/s • Disk-to-disk throughput & useful applications • Need faster cpus (extra 60% MHz/Mbits/s over TCP for disk to disk), understand how to use multi-processors • Further evaluate new stacks with real-world links, and other equipment • Other NICs • Response to congestion, pathologies • Fairness • Deploy for some major (e.g. HENP/Grid) customer applications • Understand how to make 10GE NICs work well with 1500B MTUs • Move from “hero” demonstrations to commonplace
More Information • 10GE tests • www-iepm.slac.stanford.edu/monitoring/bulk/10ge/ • sravot.home.cern.ch/sravot/Networking/10GbE/10GbE_test.html • TCP stacks • netlab.caltech.edu/FAST/ • datatag.web.cern.ch/datatag/pfldnet2003/papers/kelly.pdf • www.icir.org/floyd/hstcp.html • Stack comparisons • www-iepm.slac.stanford.edu/monitoring/bulk/fast/ • www.csm.ornl.gov/~dunigan/net100/floyd.html • www-iepm.slac.stanford.edu/monitoring/bulk/tcpstacks/
FAST TCP vs. Reno – 1 stream N.b. RTT curve for Caltech shows why FAST performs poorly against Reno (too polite?)
Scalable vs. Reno - 1 stream 8MB windows, 2 hosts, competing
Other high speed gotchas • Large windows and large number of streams can cause last stream to take a long time to close. • Linux memory leak • Linux TCP configuration caching • What is the window size actually used/reported • 32 bit counters in iperf and routers wrap, need latest releases with 64bit counters • Effects of txqueuelen (number of packets queued for NIC) • Routers do not pass jumbos • Performance differs between drivers and NICs from different manufacturers • May require tuning a lot of parameters