Hybrid network traffic engineering system (HNTES)

Hybrid network traffic engineering system (HNTES) Zhenzhen Yan, M. Veeraraghavan, Chris Tracy University of Virginia ESnet June 23, 2011 Please send feedback/comments to: zy4d@virginia.edu, mv5g@virginia.edu, ctracy@es.net This work was carried out as part of a sponsored research project from the US DOE ASCR program office on grant DE-SC002350

Outline • Problem statement • Solution approach • HNTES 1.0 and HNTES 2.0 (ongoing) • ESnet-UVA collaborative work • Future work: HNTES 3.0 and integrated network • Project web site: http://www.ece.virginia.edu/mv/research/DOE09/index.html

Problem statement • Hybrid network isone that supports both IP-routed and circuit services on: • Separate networks as in ESnet4, or • An integrated network • Ahybrid network traffic engineering system (HNTES) is one that moves data flows between these two services as needed • engineers the traffic to use the service type appropriate to the traffic type

Two reasons for using circuits • Offer scientists rate-guaranteed connectivity • necessary for low-latency/low-jitter applications such as remote instrument control • provides low-variance throughput for file transfers • Isolate science flows from general-purpose flows

Role of HNTES • HNTES is a network management system and if proven, it would be deployed in networks that offer IP-routed and circuit services

Outline • Problem statement • Solution approach • Tasks executed by HNTES • HNTES architecture • HNTES 1.0 vs. HNTES 2.0 • HNTES 2.0 details • ESnet-UVA collaborative work • Future work: HNTES 3.0 and integrated network

Three tasks executed by HNTES 1. online: upon flow arrival 2. 3.

HNTES architecture • Offline flow analysis and populate MFDB • RCIM reads MFDB and programs routers to port mirror packets from MFDB flows • Router mirrors packets to FMM • FMM asks IDICM to initiate circuit setup as soon as it receives packets from the router corresponding to one of the MFDB flows • IDCIM communicates with IDC, which sets up circuit and PBR for flow redirection to newly established circuit • HNTES 1.0

Heavy-hitter flows • Dimensions • size (bytes): elephant and mice • rate: cheetah and snail • duration: tortoise and dragonfly • burstiness: porcupine and stingray • Kun-chan Lan and John Heidemann, A measurement study of correlations of Internet flow characteristics. ACMComput. Netw. 50, 1 (January 2006), 46-62.

HNTES 1.0 vs. HNTES 2.0 • Focus: DYNAMIC (or online) circuit setup • IDC circuit setup delay is about 1 minute • Can use circuits only for • long-DURATION flows   • HNTES 1.0 logic

Rationale for HNTES 2.0 • Why the change in focus? • Size is the dominant dimension of heavy-hitter flows in ESnet • Large sized (elephant) flows have negative impact on mice flows and jitter-sensitive real-time audio/video flows • Do not need to assign individual circuits for elephant flows • Flow monitoring module impractical if all data packets from heavy-hitter flows are mirrored to HNTES

HNTES 2.0 solution • Task 1: offline algorithm for elephant flow identification - add/delete flows from MFDB • Nightly analysis of MFDB for new flows (also offline) • Task 2: IDCIM initiates provisioning of rate-unlimited static MPLS LSPs for new flows if needed • Task 3: RCIM configures PBR in routers for new flows • HNTES 2.0 does not use FMM • MFDB: Monitored Flow Data Base • IDCIM: IDC Interface Module • RCIM: Router Control Interface Module • FMM: Flow Monitoring Module

HNTES 2.0: use rate-unlimited static MPLS LSPs • LSP 1 to site PE router • With rate-limited LSPs: If the PNNL router needs to send elephant flows to 50 other ESnet routers, the 10 GigE interface has to be shared among 50 LSPs • A low per-LSP rate will decrease elephant flow file transfer throughput • With rate-unlimited LSPs, science flows enjoy full interface bandwidth • Given the low rate of arrival of science flows, probability of two elephant flows simultaneously sharing link resources, though non-zero, is small. Even when this happens, theoretically, they should each receive a fair share • No micromanagement of circuits per elephant flow • Rate-unlimited virtual circuits feasible with MPLS technology • Removes need to estimate circuit rate and duration • 10 GigE • LSP 50 to site PE router • PNNL-located • ESnet PE router • PNWG-cr1 • ESnet core router

HNTES 2.0 Monitored flow database (MFDBv2) • Flow analysis table • Identified elephant flows table • Existing circuits table

HNTES 2.0 Task 1Flow analysis table • Definition of “flow”: source/destination IP address pair (ports not used) • Add sizes for a flow from all flow records in say one day • Add flows with total size > threshold (e.g. 1GB) to flow analysis table • Enter 0 if a flow size on any day after it first appears is < threshold • Enter NA for all days other than when it first appears as a > threshold sized flow • Sliding window: number of days

HNTES 2.0 Task 1Identified elephant flows table • Sort flows in flow analysis table by a metric • Metric: weighted sum of • persistency measure • size measure • Persistency measure: Percentage of days in which size is non-zero out of the days for which data is available • Size measure: Average per-day size measure (for days in which data is available) divided by max value (among all flows) • Set threshold for weighted sum metric and drop flows whose metric is smaller than threshold • Limits number of rows in identified elephant flows table

Sensitivity analysis • Size threshold, e.g., 1GB • Period for summation of sizes, e.g., 1 day • Sliding window, e.g., 30 days • Value for weighted sum metric

Is HNTES 2.0 sufficient? • Will depend on persistency measure • if many new elephant flows appear each day, need a complementary online solution • Online  Flow Monitoring Module (FMM)

Outline • Problem statement • Solution approach • HNTES 1.0 and HNTES 2.0 (ongoing) • ESnet-UVA collaborative work • Netflow data analysis • Validation of Netflow based size estimation • Effect of elephant flows • SNMP measurements • OWAMP data analysis • GridFTP transfer log data analysis • Future work: HNTES 3.0 and integrated network

Netflow data analysis • Zhenzhen Yan coded OFAT (Offline flow analysis tool) and R program for IP address anonymization • Chris Tracy is executing OFAT on ESnet Netflow data and running the anonymization R program • Chris will provide UVA Flow Analysis table with anonymized IP addresses • UVA will analyze flow analysis table with R programs, and create identified elephant flows table • If high persistency measure, then offline solution is suitable; if not, need HNTES 3.0 and FMM!

Findings: NERSC-mr2, April 2011 (one month data) Persistency measure = ratio of (number of days in which flow size > 1GB) to (number of days from when the flow first appears) Total number of flows = 2281 Number of flows that had > 1GB transfers every day = 83

Data doors • Number of flows from NERSC data doors = 84 (3.7% of flows) • Mean persistency ratio of data door flows = 0.237 • Mean persistency ratio of non-data door flows = 0.197 • New flows graph right skewed  offline is good enough? (just one month – need more months’ data analysis) • Persistency measure is also right skewed  online may be needed

Validation of size estimation from Netflow data • Hypothesis • Flow size from concatenated Netflow records for one flow can be multiplied by 1000 (since the ESnet Netflow sampling rate is 1 in 1000 packets) to estimate actual flow size

Experimental setup • GridFTP transfers of 100 MB, 1GB, 10 GB files • sunn-cr1 and chic-cr1 Netflow data used • Chris Tracy set up this experiment

Flow size estimation experiments • Workflow inner loop (executed 30 times): • obtain initial value of firewall counters at sunn-cr1 and chic-cr1 routers • start GridFTP transfer of a file of known size • from GridFTP logs, determine data connection TCP port numbers • read firewall counters at the end of the transfer • wait 300 seconds for Netflow data to be exported • Repeat experiment 400 times for 100MB, 1 GB and 10 GB file sizes • Chris Tracy ran the experiments

Create log files • Filter out GridFTP flows from Netflow data • For each transfer, find packet counts and byte counts from all the flow records and add • Multiply by 1000 (1-in-1000 sampling rate) • Output the byte and packet counts from the firewall counters • Size-accuracy ratio = Size computed from Netflow data divided by size computed from firewall counters • Chris Tracy wrote scripts to create these log files and gave UVA these files for analysis

Size-accuracy ratio • Sample mean shows a size-accuracy ratio close to 1 • Standard deviation is smaller for larger files. • Dependence on traffic load • Sample size = 50 • Zhenzhen Yan analyzed log files

Outline • Problem statement • Solution approach • HNTES 1.0 and HNTES 2.0 (ongoing) • ESnet-UVA collaborative work • Netflow data analysis • Validation of Netflow based size estimation • Effect of elephant flows • SNMP measurements • OWAMP data analysis • GridFTP log analysis • Future work: HNTES 3.0 and integrated network

Effect of elephant flows on link loads • SNMP link load averaging over 30 sec • Five 10GB GridFTP transfers • Dashed lines: rest of the traffic load • 10 Gb/s • 2.5 Gb/s • CHIC-cr1 • interface SNMP load • SUNN-cr1 • interface SNMP load • 1 minute • Chris Tracy

OWAMP (one-way ping) • One-Way Active Measurement Protocol (OWAMP) • 9 OWAMP servers across Internet2 (72 pairs) • The system clock is synchronized • The “latency hosts” (nms-rlat) are dedicated only to OWAMP • 20 packets per second on average (10 for ipv4, 10 for ipv6) for each OWAMP server pair • Raw data for 2 weeks obtained for all pairs

Study of “surges” (consecutive higher OWAMP delays on 1-minute basis) • Steps: • Find the 10th percentile delay bacross the 2-weeks data set • Find the 10th percentile delay i for each minute • If i > n × b, iis considered a surge point (n = 1.1, 1.2, 1.5) • Consecutive surge points are combined as a single surge

Study of surges cont. • Sample absolute values of 10th percentile delays

PDF of surge duration • a surge lasted for 200 mins • the median value is 34 mins

95th percentile per minute • The 95 percentile delay per min was 4.13 (CHIC-LOSA), 10.1 (CHIC-KANS) and 5.4 (HOUS-LOSA) times the one way propagation delay

Future workDetermine cause(s) of surges • Host (OWAMP server) issues? • In addition to OWAMP pings, OWAMP server pushes measurements to Measurement Archive at IU • Interference from BWCTL at HP LAN switch within PoP? • Correlate BWCTL logs with OWAMP delay surges • Router buffer buildups due to elephant flows • Correlate Netflow data with OWAMP delay surges • If none of above, then surges due to router buffer buildups resulting from multiple simultaneous mice flows

GridFTP data analysis findings • All GridFTP transfers from NERSC GridFTP servers that > 100 MB: one month (Sept. 2010) • Total number of transfers: 124236 • Data from GridFTP logs

Throughput of GridFTP transfers • Total number of transfers: 124236 • Most transfers get about 50 MB/sec or 400 Mb/s

Variability in throughput for files of the same size • There were 145 file transfers of size 34359738368 (bytes) – 34 GB approx. • IQR (Inter-quartile range) measure of variance is 695 Mbps • Need to determine other end and consider time

Outline • Problem statement • Solution approach • HNTES 1.0 and HNTES 2.0 (ongoing) • ESnet-UVA collaborative work • Future work: HNTES 3.0 and integrated network

HNTES 3.0 • Online flow detection • Packet header based schemes • Payload based scheme • Machine learning schemes • For ESnet • Data door IP address based 0-length (SYN) segment mirroring to trigger PBR entries (if full mesh of LSPs), and LSP setup (if not a full mesh) • PBR can be configured only after finding out the other end’s IP address (data door is one end) • “real-time” analysis of Netflow data • Need validation by examining patterns within each day

HNTES in an integrated network • Setup two queues on each ESnet physical link; each rate-limited • Two approaches • Use different DSCP taggings • General purpose: rate limited at 20% capacity • Science network: rate limited at 80% capacity • IP network + MPLS network • General purpose: same as approach I • Science network: full mesh of MPLS LSPs mapped to 80% queue • Ack: Inder Monga

Comparison • In first solution, there is no easy way to achieve load balancing of science flows • Second solution: • MPLS LSPs are rate unlimited • Use SNMP measurements to measure load on each of these LSPs • Obtain traffic matrix • Run optimization to load balance science flows by rerouting LSPs to use whole topology • Science flows will enjoy higher throughput than in the first solution because TE system can periodically re-adjust routing of LSPs

Discuss integration with IDC • IDC established LSPs have rate policing at ingress router • Not suitable for HNTES redirected science flows • Add a third queue for this category • Discussion with Chin Guok

Summary • HNTES 2.0 focus • Elephant (large-sized) flows • Offline detection • Rate-unlimited static MPLS LSPs • Offline setting of policy based routes for flow redirection • HNTES 3.0 • Online PBR configuration • Requires flow monitoring module to receive port mirrored packets from routers and execute online flow redirection after identifying other end • HNTES operation in an integrated network

Hybrid network traffic engineering system (HNTES)